<![CDATA[Algebrology]]>https://algebrology.github.io/https://algebrology.github.io/favicon.pngAlgebrologyhttps://algebrology.github.io/Ghost 3.30Mon, 24 Aug 2020 18:30:05 GMT60<![CDATA[Vector, Covector and Tensor Fields]]>

Last time I said that our motivation for defining the tangent bundle was so that we'd be able to define smooth vector fields on a manifold, but I didn't quite get there! Let's do that now, since we are already equipped to do so!

Definition. Let $M$ be a smooth

]]>
https://algebrology.github.io/vector-covector-and-tensor-fields/5e51ec7ad6f91f0049df6c98Sun, 23 Feb 2020 09:07:00 GMT

Last time I said that our motivation for defining the tangent bundle was so that we'd be able to define smooth vector fields on a manifold, but I didn't quite get there! Let's do that now, since we are already equipped to do so!

Definition. Let $M$ be a smooth manifold and $TM$ its tangent bundle. A smooth vector field on $M$ is a smooth section of the bundle projection map $\pi:TM\to M$.

We write $\Gamma(TM)$ to denote the set of all vector fields on $M$.

This may seem like gibberish at first, so let's parse the definition a bit. If $X$ is a vector field on $M$, then as a section of $\pi$ it must be a map $X:M\to TM$ for which $\pi\circ X=1_M$. So it has the correct domain and codomain, since we want a vector field to be an assignment of a tangent vector to each point of our manifold. Furthermore, it's a smooth map, which makes sense because $M$ and $TM$ are both manifolds. Lastly, it assigns to each point $p\in M$ a tangent vector in $T_p M$ based at $p$. That's because, as we saw last time, sections provide representatives for the level sets of a function. More formally, since $\pi(X(p))=p$, it must be that $X(p)$ is of the form $(Y, p)\in TM$ for some vector $Y\in T_pM$, so that $\pi\big((Y, p)\big)=p$.

vector-field

This is only part of the picture, though, because we want to be able to work with vector fields algebraically. One might think to turn $\Gamma(TM)$ into a vector field over $\R$. This would allow us to scale vector fields and add them together in the natural ways. However, scaling an entire vector field by a fixed amount is pretty boring. We can do much better!

Notice that the set of smooth functions $C^\infty(M)$ is a commutative ring with unity under the usual addition and multiplication of functions. It fails to be a field, though, because it lacks inverses for certain nonzero elements (a nonzero function which vanishes anywhere has no multiplicative inverse). If we use smooth functions as our scalars by turning $\Gamma(TM)$ into a module over $C^\infty(M)$, we gain the power to scale vector fields by smooth functions. That is, the amount of scaling will depend on where in the manifold we look.

We think of tangent vectors as maps that act on smooth functions and spit out real numbers. Unfortunately, vector fields as we defined them do not act on smooth functions, they act on points in $M$ and produce tangent vectors. The following trick allows us to think of them instead as objects which act on smooth functions again.

Definition. Given a smooth manifold $M$ and a vector field $X\in\Gamma(TM)$, the associated derivation of $X$ is the map $\hat{X}:C^\infty(M)\to C^\infty(M)$ defined by

$$(\hat{X}f)(p)=X(p)f$$

for any point $p\in M$.

The fact that the assoiated derivation of a vector field is actually a derivation follows from the fact that tangent vectors are derivations.

Theorem. The associated derivation of a vector field is a derivation on $C^\infty(M)$.

Proof. Let $M$ be a smooth manifold and choose a vector field $X\in\Gamma(TM)$. If $f$ and $g$ are smooth functions in $C^\infty(M)$ then for any point $p\in M$,

$$\begin{align}
(\hat{X}(fg))(p) &= X(p)(fg) \\
&= fX(p)g + gX(p)f \\
&= f(\hat{X}(g))(p) + g(\hat{X}(f))(p) \\
&= (f\hat{X}g + g\hat{X}f)(p)
\end{align}$$

because $X(p)$ is a tangent vector in $T_pM$ which we know to be a derivation. It follows that $\hat{X}(fg) = f\hat{X}g + g\hat{X}f$ and thus $\hat{X}$ is a derivation of smooth functions.

Because we like to think of vector fields as objects which act on smooth functions, we will henceforth treat a vector field and its associated derivation as the same object and never speak of this again. Just remember what's really going on behind the scenes, so that the definitions all line up properly.

It is convenient now to make the following definition, which extends our previous definition of partial derivative operators with respect to a chart component at a point:

Definition. Let $M$ be a smooth manifold and let $(U, x)$ be a chart on $M$. The partial derivative vector field with respect to the $i$th component of the chart map $x$ is the vector field

$$\frac{\partial}{\partial x^i}:C^\infty(M)\to C^\infty(M)$$

defined by

$$\frac{\partial}{\partial x^i}f = \partial_i(f\circ x^{-1})$$

for any smooth function $f\in C^\infty(M)$, where $\partial_i$ represents the normal partial derivative operator with respect to the $i$th component in the sense of multivariable calculus.

There's just one minor issue, though. Only modules over division rings are guaranteed to have a basis. Since $C^\infty(M)$ is not a division ring, it is possible that $\Gamma(TM)$ might not have a basis for some manifolds $M$.

Example. Let $M=\R$. Vector fields in $\R$ are pretty boring. They're all of the form

$$X=f\frac{\partial}{\partial x}$$

for some smooth function $f$, where $(\R, x)$ is the global chart on $\R$. That's because at each point, a tangent vector can only point in one of two directions: negative or positive. Thus, $\Gamma(T\R)$ in this case has $\Big(\dfrac{\partial}{\partial x}\Big)$ as a basis, since every vector field is a $C^\infty(M)$-multiple of $\dfrac{\partial}{\partial x}$.

Example. Let $M=S^2$, the two-dimensional unit sphere. It is a theorem of algebraic topology (the hairy ball theorem) that there is no nonvanishing continuous tangent vector field on spheres of even dimension. Thus, there is certainly no nonvanishing smooth tangent vector field on $S^2$. Basis vector fields can't vanish, and so it follows that there is no basis for $\Gamma(TS^2)$.

This doesn't really turn out to be a problem though, because it turns out that $\Gamma(TM)$ always has a local basis on any chart. That is, if we choose a chart $(U, x)$ on $M$ then $\Big(\dfrac{\partial}{\partial x^i}\Big)_{i=1}^n$ is a basis for $\Gamma(TU)$. Since we usually work in terms of charts anyway, this is good enough for our purposes.

Next we turn to the cotangent bundle, so we can develop covector fields.

Definition. Given a smooth manifold $M$, its cotangent bundle is the set

$$T^*M = \bigsqcup_{p\in M}T_p^* M.$$

Just as we did for the tangent bundle, we can turn the cotangent bundle into a topological space by defining a bundle projection map $\pi^*:T^*M\to M$ and using the induced topology. We can then give $T^*M$ a smooth manifold structure exactly as we did for $TM$, and of course it will also have twice the dimension of the original manifold. I will omit these details, though, so that this post is not an almost exact duplicate of the previous post.

This allows us to define smooth covector fields in the obvious way:

Definition. Let $M$ be a smooth manifold and $T^*M$ its cotangent bundle. A smooth covector field on $M$ is a smooth section of the bundle projection map $\pi^*:T^*M\to M$.

We write $\Gamma(T^*M)$ to denote the set of all covector fields on $M$.

While vector fields act on functions, covector fields act on vector fields. However, we give them the same $C^\infty(M)$-module structure.

Definition. Let $M$ be a smooth manifold and let $f\in C^\infty(M)$ be a smooth function. The differential of $f$ is the covector field

$$\d f:\Gamma(TM)\to C^\infty(M)$$

defined by

$$(\d f)X = Xf$$

for any smooth vector field $X\in\Gamma(TM)$.

Just like with vector fields, given a chart $(U, x)$, the covector fields $(\d x^i)_{i=1}^n$ form a local basis for $\Gamma(T^*U)$.

Now that we have a grip of smooth vector and covector fields on a smooth manifold, along with their $C^\infty(M)$-module structures, we can define smooth tensor fields — one of the most important objects in differential geometry.

Definition. Let $M$ be a smooth manifold. A smooth tensor field on $M$ is a $C^\infty(M)$-multilinear map

$$T:\prod_{i=1}^j \Gamma(T^*M)\times\prod_{i=1}^k \Gamma(TM)\to\R.$$

We say that the rank of this tensor field is $(j,k)$.

In any chart $(U, x)$, a smooth tensor field is locally a tensor product of the form

$$T = \sum_{\substack{i\in I\\ j\in J}}\bigotimes_{i\in I}f^i\frac{\partial}{\partial x^i}\otimes\bigotimes_{j\in J}g_j\d x^j.$$

There's a lot more to be said about tensor fields, but I'll leave it at that for now.

One last thing I'll mention, but won't go into too much detail on, is that the pushforward and pullback can be extended to vector and covector fields, respectively. Hence, the pushforward acts as a covariant functor from the category of smooth manifolds with smooth maps into the category of vector bundles. Likewise, the pullback acts as a contravariant functor between the same categories.

]]>
<![CDATA[The Tangent Bundle]]>https://algebrology.github.io/the-tangent-bundle/5e516f62564b2a004912fd0aSat, 22 Feb 2020 20:36:21 GMT
UPDATE: I am aware of an issue in this post regarding the definition of the topology on the tangent bundle, namely that it is not Hausdorff and thus the tangent bundle as I've defined it is not even a topological manifold. I will correct this issue when I have time. However, the intuition and most of the details of this post are still valid, so I encourage you to read it anyway even before I correct my mistake.

Before we begin, I need to introduce a few new concepts. The first is that of a module, which is like a vector space but whose scalars come from a ring instead of a field. I guess I should formally define what a ring is before defining a module.

Definition. A ring is a set $R$ of elements called scalars, together with two binary operations:

Addition
which assigns to any pair of scalars $a,b\in R$ the scalar $a+b\in R$,

Multiplication
which assigns to any pair of scalars $a,b\in R$ the scalar $ab\in R$.

Any ring and its operations must satisfy the following properties:

Additive Identity
There exists $0\in R$ such that $0+a=a$ for every $a\in R$.

Additive Inverses
For every $a\in R$, there exists $b\in R$ for which $a+b=0$.

Commutative Property of Addition
For all $a,b\in R$, we have that $a+b=b+a$.

Associative Property of Addition
For all $a,b,c\in R$, we have that $(a+b)+c=a+(b+c)$.

Associative Property of Multiplication
For all $a,b,c\in R$, we have that $(ab)c=a(bc)$.

Distributive Property
For all $a,b,c\in R$, we have that $a(b+c)=ab+ac$.

So basically a ring is just like a field, except it does not necessarily have a multiplicative identity or multiplicative inverses, and multiplication is not necessarily commutative. If a ring does have a multiplicative identity, it is called a ring with unity. If it additionally has multiplicative inverses (since this would make no sense without an identity) it is called a division ring. If its multiplication is commutative, it is called a commutative ring.

The definition of a module is exactly the same as that of a vector space, except its scalars come from a ring.

Definition. A module over a ring $R$ is a set $\Gamma$, together with two operations:

Addition
which assigns to any pair $u,v\in \Gamma$ the element $u+v\in \Gamma$,

Scalar Multiplication
which assigns to any scalar $a\in R$ and any $v\in \Gamma$ the element $av\in \Gamma$.

Any module and its operations must satisfy the following properties:

Zero Element
There exists $0\in \Gamma$ such that $0+v=v$ for every $v\in \Gamma$.

Additive Inverses
For every $u\in \Gamma$, there exists $v\in \Gamma$ for which $u+v=0$.

Commutative Property of Addition
For all $u,v\in \Gamma$, we have that $u+v=v+u$.

Associative Property of Addition
For all $u,v,w\in \Gamma$, we have that $(u+v)+w=u+(v+w)$.

Compatibility with Ring Multiplication
For all $a,b\in R$ and $v\in \Gamma$, we have that $(ab)v=a(bv)$.

Scalar Multiplicative Identity
For every $v\in \Gamma$, we have that $1v=v$.

First Distributive Property
For all $a,b\in R$ and $v\in \Gamma$, we have that $(a+b)v=av+bv$.

Second Distributive Property
For all $a\in R$ and $u,v\in \Gamma$, we have that $a(u+v)=au+av$.

The next new concept we need is that of the disjoint union of sets:

Definition. Given an indexing set $I$ and a collection of sets $\{A_i\}_{i\in I}$ indexed by $I$, their disjoint union is the set

$$\bigsqcup_{i\in I}A_i = \bigcup_{i\in I} A_i\times\{i\}.$$

The main point of the disjoint union is to keep an identifier for which set the elements originally came from.

Example. Let $A_1=A_2=\R$. Normally the union of $A_1$ and $A_2$ would just be $\R$ itself, since every element of $A_2$ is already in $A_1$. However, their disjoint union is the set

$$A_1\sqcup A_2 = (\R\times\{1\})\cup(\R\times\{2\}).$$

Each element of this disjoint union is either of the form $(a, 1)$ or $(a, 2)$, where $a\in\R$, so these elements look and feel just like real numbers except that we can tell whether they came from the set $A_1$ or $A_2$.

There is, of course, no reason why all the sets being unioned together need to be the same set. Later we shall see an example where each set is different, but it is still useful to keep the identifying index.

Lastly, we will need the idea of a section. This is sort of the dual concept to that of a retraction, which I introduced when discussing the Brouwer Fixed Point Theorem in my post on connectedness.

Definition. Let $f:A\to B$ be a function between sets. A function $s:B\to A$ is called a section of $f$ if $f\circ s=1_B$. That is, if $f\circ s(b)=b$ for every $b\in B$.

Informally speaking, sections provide a way to pick out representatives of level sets of the function $f$. To see what I mean, let's look at an example.

Example. Let $A$ denote a set of people and let $B$ denote a set of continents. Define $f:A\to B$ as the function which tells you which continent a person was born in. For instance, $f($me$) =$ North America.

sections-1-flattened

In order to define a section $s:B\to A$ for which $f\circ s$ is the identity map on $B$, all we need to do is send each continent to one person who lives there. Here are two possible sections for $f$, $s_1, s_2:B\to A$.

sections-2-flattened

Check for yourself that $f\circ s_1$ and $f\circ s_2$ are both the identity on $B$. And notice that my claim earlier holds true: these sections pick one representative of each continent. This concept will be very important later on.

Now we have the necessary machinery to proceed!

So far we've seen how to construct tangent vectors to a manifold at a point, and tangent spaces to manifolds at a point. But what if we wanted to do this at every point on our manifold? In multivariable calculus, we have the concept of a smooth vector field, which is an assignment of a vector to each point in $\R^n$ in such a way that the vectors vary smoothly as we move around in space. We would like to do the same on smooth manifolds, but we immediately run into some problems.

The first and most obvious problem is that defining smoothness of vector fields is going to be extremely tricky, since smoothness is a property of maps from smooth manifolds to smooth manifolds.

The second problem is even more troubling. We would like to define a vector field $X$ as a map which acts on any point $p$ in a smooth manifold and returns a tangent vector in $T_pM$. However, we don't currently have any set which can act as the codomain of such a map. This is because each point in the manifold has a different tangent space, so a vector at $p$ lives in $T_pM$ while a vector based at $q$ lives in $T_qM$.

So before we can proceed, we need to come up with a set comprised of all possible tangent vectors to all the points on our manifold, so this set can act as the codomain of a vector field. Furthermore, since we want a concept of smoothness for our vector fields, this set will have to be a smooth manifold. This smooth manifold I'm describing will be the tangent bundle.

Definition. Given a smooth manifold $M$, its tangent bundle is the set

$$TM = \bigsqcup_{p\in M} T_pM.$$

The name "bundle" is actually very appropriate since the tangent bundle, as a set, is really just all the tangent spaces bundled together. Every element of the tangent bundle $TM$ is of the form $(X, p)$, where $p\in M$ is a point in our manifold and $X\in T_pM$ is a tangent vector based at $p$. Our goal now is to introduce a smooth manifold structure on the tangent bundle.

A smooth manifold is a second countable, Hausdorff, locally Euclidean topological space. So the first thing we need to do is give $TM$ a topology. To this end, we introduce a simple but extremely important definition:

Definition. Given a smooth manifold $M$ and its tangent bundle $TM$, the bundle projection map is the function $\pi:TM\to M$ defined by $\pi\big((X,p)\big)=p$ for every $(X,p)\in TM$.

It is extremely important to note that the bundle projection map is surjective!

We will equip the tangent bundle $TM$ wth the bare minimum topology required to make the bundle projection map $\pi$ continuous. This is actually quite simple. We define a set $U\subseteq TM$ as open if and only if $U=\pi^{-1}[V]$ for some open set $V\subseteq M$. In this sense, $TM$ inherits its topology from $M$, which is exactly what we want!

Let's verify that this is actually a topology, and that $\pi$ is continuous with this topology, as claimed.

Theorem. The topology on $TM$ defined above is in fact a topology, and $\pi:TM\to M$ is continuous.

Proof. To show that this is a topology on the tangent bundle, we must verify three things: that the empty set and $TM$ itself are open, that the union of any collection of open sets is open, and that the intersection of any finite collection of open sets is open.

To see that the empty set is open in $TM$, we note that $\varnothing\subseteq M$ is open because $M$ is a topological space. Thus, $\varnothing=\pi^{-1}[\varnothing]$ is open by definition. To see that $TM$ is open, we note that $M$ is open in itself, and so $\pi^{-1}[M]=TM$ is open by definition.

Next, we argue that the union of open sets in $TM$ is open. Let $I$ be an indexing set for which $\{U_i\}_{i\in I}$ is a collection of open subsets of $TM$. Then each $U_i=\pi^{-1}[V_i]$ for some open set $V_i\subseteq M$. It follows from the properties of the preimage that

$$\begin{align}
\bigcup_{i\in I}U_i &= \bigcup_{i\in I}\pi^{-1}[V_i] \\
&= \pi^{-1}\Big[\bigcup_{i\in I}V_i\Big].
\end{align}$$

Since the union of open sets in $M$ is open, it follows that $\bigcup_{i\in I}V_i$ is open in $M$ and thus $\bigcup_{i\in I}U_i$ is open in $TM$.

Lastly, we argue that the intersection of a finite collection of open sets in $TM$ is open. Let $U_1, U_2, \ldots, U_n$ be a collection of open subsets of $TM$. Then each $U_i=\pi^{-1}[V_i]$ for some open set $V_i\subseteq M$. It follows from the properties of the preimage that

$$\begin{align}
\bigcap_{i=1}^n U_i &= \bigcap_{i=1}^n \pi^{-1}[V_i] \\
&= \pi^{-1}\Big[\bigcap_{i=1}^n V_i\Big].
\end{align}$$

Since the intersection of a finite collection of open sets in $M$ is open, it follows that $\bigcup_{i=1}^n V_i$ is open in $M$ and thus $\bigcup_{i=1}^n U_i$ is open in $TM$. Thus, this is indeed a topology on $TM$.

It is almost trivial to show that $\pi:TM\to M$ is continuous. Consider any open set $V\subseteq M$. Then $\pi^{-1}[V]$ is open in $TM$ by the definition of our topology! Thus, $\pi$ is a continuous function.

So $TM$ is indeed a topological space with the topology induced by $M$ as we described. The next step is to show that it is a topological manifold. To do this, we technically need to check that $TM$ is Hausdorff and second countable. I will not do this here, though, since it is not very informative. It is vital however to show that it is locally Euclidean. To do this, we simply need to demonstrate that every point in $TM$ in contained in some chart domain.

To this end, choose any point $(X, p)$ in $TM$. Then $p\in M$ and $X\in T_pM$. We will use the atlas on $M$ to construct a chart whose domain contains $(X, p)$. Since $M$ is a manifold, there is at least one chart $(U, x)$ for $M$ whose domain contains $p$.

We use this information to define a chart $(U', x')$ for $TM$ whose domain contains $(X, p)$ as follows:

$$\begin{align}
U' &= \bigsqcup_{q\in U} T_q M \\
x'\big((Y, q)\big) &= \big(x^1(q), x^2(q), \ldots, x^n(q), Y^1, Y^2, \ldots, Y^n\big),
\end{align}$$

where $n=\dim M$ and $Y=\displaystyle\sum_{i=1}^n Y_i\frac{\partial}{\partial x^i}\at{q}\in T_qM$.

tangent-bundle

Notice that $x':TM\to\R^{2n}$, so what I am saying is that $TM$ is a manifold whose dimension is twice the dimension of $M$. We of course need to verify that the set $U'$ as defined above is open in $TM$. Technically we also need to show that $x'$ is an embedding, but I will omit this step since it is not difficult or interesting.

Theorem. The set $U'$ as defined above is open in the tangent bundle $TM$. That is, if $U$ is an open set in $M$ then $U'=\bigsqcup_{q\in U}T_qM$ is an open set in $TM$.

Proof. We need to construct an open set $V\subseteq M$ for which $U'=\pi^{-1}[V]$. It just so happens that this set is $U$ itself, since

$$\begin{align}
\pi^{-1}[U] &= \{(X, q)\in TM \mid \pi\big((X,q)\big)\in U\} \\
&= \bigsqcup_{q\in U} T_qM \\
&= U'.
\end{align}$$

Since $U'$ is the preimage under $\pi$ of an open set in $M$, it is open in the tangent bundle $TM$.

So we've shown that $TM$ is a topological manifold, and it was almost too easy! In fact, most things related to the tangent bundle are. It is defined to be nice and to make our lives easier.

Lastly, we should show that the atlas we've exhibited on $TM$ is a smooth atlas. That is, if $(U', x')$ and $(V', y')$ are charts for $TM$ with nonempty intersection, $U'\cap V'\ne\varnothing$, then we need to show that the transition map $x'\circ(y')^{-1}:\R^{2n}\to\R^{2n}$ is smooth in the sense of ordinary multivariable calculus. It is not too difficult to do so, but it is a bit ugly. To simplify the proof, we'll need a lemma which explains what happens to the components of tangent vectors under a change of chart.

Lemma. Suppose $(U, x)$ and $(V, y)$ are charts on a smooth manifold $M$ of dimension $n$ with $U\cap V\ne\varnothing$. If $p\in U\cap V$ and $Z\in T_p M$ is the tangent vector

$$Z = \sum_{i=1}^n b^i \frac{\partial}{\partial y^i}\at{p}$$

with respect to the chart map $y$, then

$$Z = \sum_{i=1}^n c^i \frac{\partial}{\partial x^i}\at{p}$$

with respect to the chart map $x$, where each component $c^j$ is given by

$$c^j = \sum_{i=1}^n b^i \frac{\partial}{\partial y^i}\at{p} x^j.$$

Proof. Since $\displaystyle\Big(\frac{\partial}{\partial x^i}\at{p}\Big)_{i=1}^n$ is a basis for $T_p M$, we know there exist scalars $c^1, c^2, \ldots, c^n\in\R$ for which

$$\sum_{i=1}^n b^i \frac{\partial}{\partial y^i}\at{p} = \sum_{i=1}^n c^i \frac{\partial}{\partial x^i}\at{p}.$$

If we apply the differential of $x^j$, the cotangent vector $\d x^j\smallat{p}:T_p M\to\R$, to both sides of the above equation, things simplify considerably. Applying it first to the right side, we obtain

$$\begin{align}
\d x^j\smallat{p}\Big(\sum_{i=1}^n c^i \frac{\partial}{\partial x^i}\at{p}\Big) &= \sum_{i=1}^n c^i \d x^j\smallat{p} \Big(\frac{\partial}{\partial x^i}\at{p}\Big) \\
&= \sum_{i=1}^n c^i\delta^j_i \\
&= c^j.
\end{align}$$

Applying $\d x^j\smallat{p}$ to the left side, we see that

$$\begin{align}
\d x^j\smallat{p}\Big(\sum_{i=1}^n b^i \frac{\partial}{\partial y^i}\at{p}\Big) &= \sum_{i=1}^n b^i \d x^j\smallat{p} \Big(\frac{\partial}{\partial y^i}\at{p}\Big) \\
&= \sum_{i=1}^n b^i \frac{\partial}{\partial y^i}\at{p} x^j
\end{align}$$

Equating the two sides, we see that

$$c^j = \sum_{i=1}^n b^i \frac{\partial}{\partial y^i}\at{p} x^j,$$

as desired.

Now that we know how tangent vectors transform under a change of basis, we can prove that the chart maps we've defined on the tangent bundle are smooth maps in the sense of multivariable calculus.

Theorem. Let $M$ be a smooth manifold. If $(U', x')$ and $(V', y')$ are charts for $TM$ with nonempty intersection, then the transition map $x'\circ(y')^{-1}:\R^{2n}\to\R^{2n}$ is smooth.

Proof. Choose any point $(a^1, a^2, \ldots, a^n, b^1, b^2, \ldots, b^n)$ in the image of the transition map $\im x'\circ(y')^{-1}\subseteq\R^{2n}$. Then

$$\begin{align}
x'\circ(y')^{-1}(a^1, a^2, \ldots, a^n, b^1, b^2, \ldots, b^n) &= x'\Big(y^{-1}(a^1, a^2,\ldots,a^n), \sum_{i=1}^n b^i \frac{\partial}{\partial y^i}\at{p}\Big) \\
&= (A^1, A^2, \ldots, A^n, B^1, B^2, \ldots, B^n),
\end{align}$$

where each $A^j$ is given by

$$\begin{align}
A^j &= x^j\circ y^{-1}(a^1, a^2, \ldots, a^n) \\
&= (x\circ y^{-1})^j (a^1, a^2, \ldots, a^n)
\end{align}$$

and each $B^j$ is given by

$$\begin{align}
B^j &= \sum_{i=1}^n b^i \frac{\partial}{\partial y^i}\at{p} x^j \\
&= \sum_{i=1}^n b^i \partial_i(x^j\circ y^{-1})\at{y(p)} \\
&= \sum_{i=1}^n b^i \partial_i(x\circ y^{-1})^j\at{y(p)}.
\end{align}$$

These components are all smooth, since the transition map $x\circ y^{-1}$ on $M$ is smooth. It follows that the transition map $x'\circ(y')^{-1}$ on $TM$ is smooth, completing the proof.

We've arrived where we wanted to: the tangent bundle is a smooth manifold! We have all the machinery we need now to develop vector fields in my next post, as well as covector and tensor fields.

]]>
<![CDATA[Cotangent Spaces and the Pullback]]>

Last time, we defined the tangent space to a smooth manifold at a point. This turned out to be a vector space with the same dimension as the manifold, and tangent vectors were directional derivative operators which acted on smooth functions and spat out real numbers. We also found that

]]>
https://algebrology.github.io/cotangent-spaces-and-the-pullback/5e504ac6564b2a004912fc9fFri, 21 Feb 2020 21:30:47 GMT

Last time, we defined the tangent space to a smooth manifold at a point. This turned out to be a vector space with the same dimension as the manifold, and tangent vectors were directional derivative operators which acted on smooth functions and spat out real numbers. We also found that choosing a chart allowed us to construct a useful basis of partial derivative operators with respect to the components of the chart map.

Now, as we so often do with vector spaces, we consider the dual space to the tangent space at a point.

Definition. If $M$ is a smooth manifold and $p\in M$ then the cotangent space to $M$ at the point $p$, written $T_p^*M$, is the dual space of $T_pM$. That is,

$$T_p^*M = (T_pM)^*.$$

We refer to elements of the cotangent space as cotangent vectors.

From my previous post on dual vector spaces, we know that the cotangent space $T_p^*M$ consists of all linear maps from $T_pM$ to $\R$. That is, a cotangent vector $\omega\in T_p^*M$ acts on tangent vectors and spits out real numbers. Furthermore, we know that $\dim T_p^*M = \dim T_pM = \dim M$.

Even so, we don't know too much about what cotangent vectors look like or how they behave. To aid us in our quest, we come to a crucial definition.

Definition. Given a smooth function $f:M\to\R$ defined on a smooth manifold $M$ and a point $p\in M$, the differential of $f$ at $p$ is the cotangent vector $\d f\smallat{p}:T_pM\to\R$ defined by

$$\d f\smallat{p} X = Xf$$

for any tangent vector $X\in T_pM$.

It is easy to check that the differential of a smooth function is a linear map, and thus truly a cotangent vector. Even so, how does this new concept help us to understand the structure of the cotangent space? The next theorem tells us how.

Theorem. Let $M$ be a smooth manifold of dimension $n$ with $p\in M$, let $(U,x)$ be a chart whose domain contains $p$ and let $x^i$ denote the $i$th component of the chart map $x$. Then $\big(\d x^i\smallat{p}\big)_{i=1}^n$ is a basis for $T_P^*M$.

Proof. Choose a tangent vector $X\in T_pM$. Then since $\Big(\dfrac{\partial}{\partial x^i}\at{p}\Big)_{i=1}^n$ is a basis for $T_pM$, we may write

$$X=\sum_{j=1}^n X^j \frac{\partial}{\partial x^j}\at{p}$$

for some collection of scalars $X^j\in\R$. Thus,

$$\begin{align}
\d x^i\smallat{p} X &= \d x^i\smallat{p} \Big(\sum_{j=1}^n X^j \frac{\partial}{\partial x^j}\at{p}\Big) \\
&= \sum_{j=1}^n X^j \frac{\partial}{\partial x^j}\at{p} x^i \\
&= \sum_{j=1}^n X^j \partial_j (x^i\circ (x^j)^{-1})\at{x(p)} \\
&= \sum_{j=1}^n X^j \delta^i_j \\
&= X^i.
\end{align}$$

That is, $d x^i\smallat{p}$ is the projection map onto the $i$th component of the vector $X$. So not only is $\big(\d x^i\smallat{p}\big)_{i=1}^n$ a basis for $T_p^*M$, it is actually the dual basis to the basis $\Big(\dfrac{\partial}{\partial x^i}\at{p}\Big)_{i=1}^n$ for $T_pM$.

We shall later extend the idea of the differential of a function to that of the exterior derivative of a differential form. But as usual I'm getting ahead of myself!

We're now equipped to define the notion of the pullback of a smooth map, which is the dual notion to the pushforward.

Definition. Let $\phi:M\to N$ be a smooth map between smooth manifolds $M$ and $N$, let $p\in M$ and let $\phi_*:T_pM\to T_{\phi(p)}N$ be the pushforward of $\phi$. Then the pullback of $\phi$ is the map

$$\phi^*:T_{\phi(p)}^*N\to T_p^*M$$

defined by

$$(\phi^*\omega)X=\omega(\phi_*X)$$

for any cotangent vector $\omega\in T_{\phi(p)}^* N$ and any tangent vector $X\in T_pM$.

So the pullback takes the cotangent vector $\omega$ in the cotangent space to $N$ at $\phi(p)$ and turns it into a cotangent vector $\phi^*\omega$ in the cotangent space to $M$ at $p$. This cotangent vector acts on a vector $X$ in the tangent space to $M$ at $p$ as described and spits out a real number.

Just like the pushforward, the pullback is a map between vector spaces and so it makes sense to ask whether it is a linear map. Again, the answer is yes.

Theorem. Let $\phi:M\to N$ be a smooth map between smooth manifolds $M$ and $N$ and let $p\in M$. Then the pullback $\phi^*:T_{\phi(p)}^*N\to T_p^*M$ of the map $\phi$ is a linear map.

Proof. Suppose $X\in T_pM$ is a tangent vector and $\omega_1,\omega_2\in T_{\phi(p)}^* N$ are cotangent vectors. Then

$$\begin{align}
\phi^*(\omega_1+\omega_2)X &= (\omega_1 + \omega_2)(\phi_*X) \\
&= \omega_1(\phi_*X) + \omega_2(\phi_*X) \\
&= (\phi^*\omega_1)X + (\phi^*\omega_2)X,
\end{align}$$

so the pullback is additive. Suppose next that $X\in T_pM$ is a tangent vector, and $\omega\in T_{\phi(p)}^* N$ is a cotangent vector and $a\in\R$ is a scalar. Then

$$\begin{align}
\phi^*(a\omega)X &= (a\omega)(\phi_*X) \\
&= a\omega(\phi_*X) \\
&= a\phi^*(\omega)X,
\end{align}$$

so the pullback is homogeneous. It follows that it is a linear map.

The pushforward took directional derivatives of curves to directional derivatives of images of curves under $\phi$ in the natural way. The same type of behavior is true for differentials of functions under the pullback!

Theorem. Let $\phi:M\to N$ be a smooth map between smooth manifolds $M$ and $N$, let $p\in M$ and let $f\in C^\infty(N)$ be a smooth function. Then

$$\phi^*\big(\d f\smallat{\phi(p)}\big) = \d (f\circ\phi)\smallat{p}.$$

Proof. Let $X\in T_pM$ be a tangent vector. Then

$$\begin{align}
\phi^*\big(\d f\smallat{\phi(p)}\big)X &= \d f\smallat{\phi(p)}(\phi_*X) \\
&= (\phi_*X)f \\
&= X(f\circ\phi) \\
&= \d (f\circ\phi)\smallat{p}X.
\end{align}$$

Now that we've explored the tangent and cotangent spaces to a smooth manifold at a point, we've gotten the difficult stuff out of the way. In my next post, we'll define the tangent and cotangent bundles, as well as vector, covector and tensor fields on a manifold. Stay tuned!

]]>
<![CDATA[Tangent Spaces and the Pushforward]]>

In calculus, you learn how to construct tangent lines to differentiable curves at a point.

tangent-lines-1

In multivariable calculus, you learn how to construct tangent planes to differentiable surfaces at a point.

tangent-plane

In differential geometry, the analogous concept is the tangent space to a smooth manifold at a point, but there's

]]>
https://algebrology.github.io/tangent-spaces-and-the-pushforward/5e44ba67361d9900498fcea2Thu, 13 Feb 2020 03:08:16 GMT

In calculus, you learn how to construct tangent lines to differentiable curves at a point.

tangent-lines-1

In multivariable calculus, you learn how to construct tangent planes to differentiable surfaces at a point.

tangent-plane

In differential geometry, the analogous concept is the tangent space to a smooth manifold at a point, but there's some subtlety to this concept. Notice how the curves and surface in the examples above are sitting in a higher-dimensional space in order to make sense of their tangent lines/plane.

The circle is a one-dimensional manifold, yet we've constructed its tangent line as an extrinsic object which lives outside the circle, and in doing so we have embedded the one-dimensional circle into a two-dimensional space. Similarly, in order to construct the tangent plane to the sphere, we've embedded it in three-dimensional space, although the sphere is a two-dimensional manifold.

This is not a huge issue for these familiar objects. But since we want to use manifolds to describe things like higher-dimensional surfaces and spacetime in general relativity, we need some way to talk about the tangent space intrinsically, without embedding our manifold in a higher dimensional space. After all, it wouldn't really make sense to think about spacetime as sitting inside of some higher-dimensional manifold.

Our construction of tangent spaces will follow this intrinsic approach, and has the benefit of being isomorphic to the parallel extrinsic approach (as we shall see later when we discuss the Whitney Embedding Theorem). Additionally, it is advantageous because of its simplicity and ease of calculation. So let's jump right into it!

The fundamental concept in defining tangent spaces is that of a smooth curve $\gamma:\R\to M$. Of course, we haven't actually defined what it means for such a curve to be smooth yet, but we can easily borrow from our experience in defining smooth functions to accomplish this.

Definition. Given a smooth manifold $M$, a curve $\gamma:\R\to M$ is a smooth curve if for every point $p\in \gamma[\R]\subseteq M$ there is a chart $(U, x)$ whose domain contains $p$ and for which $x\circ\gamma$ is smooth in the sense of ordinary differential calculus.

smooth-curve

Of course, we need to justify that this is a well-defined notion, since we defined it in terms of charts. But since chart transition maps on a smooth manifold are smooth, the proof of this fact looks almost identical to the analogous arguments from the last post, so I will not give it here.

In regular multivariable calculus, we have the idea of vectors based at a point and gradients of functions, and we form directional derivatives by combining vectors and gradients using an inner product. In differential geometry, there is no separating the concepts. Vectors are directional derivatives, plain and simple.

The key idea in defining tangent vectors is the following: We can use smooth curves as measurement devices to determine the rate at which a smooth function is changing as we move along our manifold. Here's how.

Definition. Let $M$ be a smooth manifold, $\gamma:\R\to M$ a smooth curve and $(U, x)$ a chart whose domain contains the point $p\in\gamma[\R]$. The directional derivative along the curve $\gamma$ at the point $p$ is the map $$V_{\gamma, p}:C^\infty(M)\to\R$$ defined by $$V_{\gamma, p} = (f\circ\gamma)'\at{\gamma^{-1}(p)}$$ for any smooth function $f\in C^\infty(M)$.

To clarify, this is just the regular derivative of the function $f\circ\gamma:\R\to\R$ taken at the point $\gamma^{-1}(p)$, which is of course a real number.

So these directional derivatives are maps which act on smooth functions and spit out real numbers. To see how these directional derivatives behave as measurement devices, as I claimed above, it is useful to look at level sets of the function $f$. A level set is just the set of points in our manifold $M$ which our function $f$ maps to the same value. To continue my analogy from last post, if $f$ describes the temperature of points on the manifold, a level set of $f$ is a set of points where the temperature is the same. Some level set of such a function might look something like this:

level-curves

Since the values of our function are constant along level sets, we would expect the directional derivative along these level sets of our function to be zero. To be more precise, suppose $f:M\to\R$ is a smooth function and $L=\{p\in M\mid f(p) = k\}$ is a level set of $f$ where $k\in\R$ is just some constant. Now suppose $\gamma:\R\to M$ is a smooth curve whose image lies in the level set $L$, i.e., $\gamma[\R]\subseteq L$. Then by the definition of $L$, we have that $f(\gamma(x))=k$ for every $x\in\R$. That is, $f\circ\gamma:\R\to\R$ is a constant function, and so its derivative $(f\circ\gamma)'$ is zero everywhere. Thus, the directional derivative $V_{\gamma, p}f$ is zero at evert point $p\in\gamma[\R]$.

Suppose instead we have a curve $\sigma:\R\to M$ along which the value of our smooth function $f$ is increasing. That is, if $x,y\in\R$ with $x<y$, then $(f\circ\sigma)(x)<(f\circ\sigma)(y)$. Then $(f\circ\sigma)'>0$ for every $x\in\R$, and so the directional derivative $V_{\sigma, p}f$ is positive at every point $p\in\sigma[\R]$.

directional-derivatives

A very nice property of these directional derivatives is that they are linear maps!

Theorem. Let $M$ be a smooth manifold, $\gamma:\R\to M$ a smooth curve and $(U, x)$ a chart whose domain contains the point $p\in\gamma[\R]$. Then the directional derivative $V_{\gamma,p}:C^\infty(M)\to\R$ is a linear map.

Proof. We will show first that $V_{\gamma,p}$ is additive. Suppose that $f,g:M\to\R$ are smooth functions. Then

$$\begin{align}
V_{\gamma, p}(f + g) &= ((f+g)\circ\gamma)'\at{\gamma^{-1}(p)} \\
&= (f\circ\gamma + g\circ\gamma)'\at{\gamma}^{-1}(p) \hint{distributive property}\\
&= (f\circ\gamma)'\at{\gamma^{-1}(p)} + (g\circ\gamma)'\at{\gamma^{-1}(p)} \hint{linearity of derivative} \\
&= V_{\gamma, p}f + V_{\gamma, p}g.
\end{align}$$

Next we will show that $V_{\gamma,p}$ is homogeneous. Suppose $f:M\to\R$ is a smooth function and $a\in\R$. Then

$$\begin{align}
V_{\gamma, p}(af) &= (af\circ\gamma)'\at{\gamma^{-1}(p)} \\
&= (a(f\circ\gamma))'\at{\gamma}^{-1}(p) \hint{distributive property}\\
&= a(f\circ\gamma)'\at{\gamma^{-1}(p)} \hint{linearity of derivative} \\
&= aV_{\gamma, p}f.
\end{align}$$

Thus, the directional derivative is a linear map.

It turns out that, in addition to being a linear map, directional derivatives are also something called derivations. This simply means they obey a "product rule"

$$
\begin{align}
V_{\gamma, p}(fg) &= (fg\circ\gamma)'\at{\gamma^{-1}(p)} \\
&= ((f\circ\gamma)(g\circ\gamma))'\at{\gamma^{-1}(p)} \\
&= (f\circ\gamma)'\at{\gamma^{-1}(p)}(g\circ\gamma)\at{\gamma^{-1}(p)} + (f\circ\gamma)\at{\gamma^{-1}(p)}(g\circ\gamma)'\at{\gamma^{-1}(p)} \\
&= (V_{\gamma, p}f)g(p) + f(p)V_{\gamma, p}(g)
\end{align}
$$

which is inherited from the product rule for ordinary derivatives. We are now equipped to define the tangent space to a manifold at a point!

Definition. Let $M$ be a smooth manifold with $p\in M$. The tangent space to $M$ at the point $p$, written $T_p M$, is the vector space of all directional derivatives along curves at $p$.

We have called the tangent space a vector space without proving that it is one. Since directional derivatives are functions $C^\infty(M)\to\R$, we can define vector addition as function addition and scalar multiplication as function multiplication. While these resulting objects are definitely functions, we have to show that they are directional derivatives. Otherwise our tangent space would not be closed under addition or multiplication, which is not allowed!

Theorem. The tangent space $T_p M$ of a smooth manifold at a point $p\in M$ is a vector space.

Proof. We will show first that $T_p M$ is closed under vector addition. That is, we must show that

$$\begin{align}
(V_{\gamma,p}+V_{\sigma,p})f &= V_{\gamma,p}f+V_{\sigma,p}f\\
&\in T_p M
\end{align}$$

for any smooth curves $\gamma,\sigma:\R\to M$. To do this, we must find a curve $\psi:\R\to M$ whose directional derivative $V_{\psi, p}$ is the sum of the directional derivatives along $\gamma$ and $\sigma$.

To construct such a curve, we need to choose a chart $(U, x)$ whose domain contains $p$, and look at the chart representatives $x\circ\gamma:\R\to\R^n$ and $x\circ\sigma:\R\to\R^n$, where $n=\dim M$. Addition of these representatives makes sense in our chart. However, since we are going to define our curve $\psi$ in terms of a chart, we need to make sure that our definition is independent of our choice of chart! We keep this in the back of our minds, as this independence will follow quite naturally.

addition-of-curves

If we add the two chart representatives $x\circ\gamma$ and $x\circ\sigma$ we will not end up with something which passes through $x(p)$ (unless $x(p)$ happens to be zero). To remedy this situation, we seek to define a new curve $\psi$ whose chart representative $x\circ\psi$ is

$$x\circ\gamma + x\circ\sigma - x(p).$$

This will guarantee that $x\circ\psi$ passes through $x(p)$. To construct this curve $\psi$, we simply take our desired chart representative and apply $x^{-1}$.

$$\psi = x^{-1}(x\circ\gamma + x\circ\sigma - x(p)).$$

Now we must show that $\psi$ really is the curve whose directional derivative is the sum of the directional derivatives of $\gamma$ and $\sigma$. That is, we need to demonstrate that $V_{\psi,p} = V_{\gamma,p} + V_{\sigma,p}$. This is quite straightforward.

$$
\begin{align}
V_{\psi,p} &= (f\circ\psi)'\at{\psi^{-1}(p)} \hint{definition} \\
&= (f\circ(x^{-1}\circ x)\circ\psi)'\at{\psi^{-1}(p)} \hint{identity} \\
&= (x\circ\psi)'\at{\psi^{-1}(p)}(f\circ x^{-1})'\at{x\circ\psi\circ\psi^{-1}(p)} \hint{chain rule} \\
&=(x\circ\psi)'\at{\psi^{-1}(p)}(f\circ x^{-1})'\at{x(p)} \hint{identity} \\
&=(x\circ\gamma + x\circ\sigma - x(p))'\at{\psi^{-1}(p)}(f\circ x^{-1})'\at{x(p)} \hint{definition of $\psi$} \\
&=(x\circ\gamma + x\circ\sigma)'\at{\psi^{-1}(p)}(f\circ x^{-1})'\at{x(p)} \hint{derivative of constant} \\
&= (x\circ\gamma)'\at{\psi^{-1}(p)}(f\circ x^{-1})'\at{x(p)} + (x\circ\sigma)'\at{\psi^{-1}(p)}(f\circ x^{-1})'\at{x(p)} \hint{linearity of derivative} \\
&= (x\circ\gamma)'\at{\gamma^{-1}(p)}(f\circ x^{-1})'\at{x(p)} + (x\circ\sigma)'\at{\sigma^{-1}(p)}(f\circ x^{-1})'\at{x(p)} \hint{$\psi^{-1}(p) = \gamma^{-1}(p) = \sigma^{-1}(p)$} \\
&= (f\circ(x^{-1}\circ x)\circ\gamma)'\at{\gamma^{-1}(p)} + (f\circ(x^{-1}\circ x)\circ\sigma)'\at{\sigma^{-1}(p)} \hint{chain rule} \\
&= (f\circ\gamma)'\at{\gamma^{-1}(p)} + (f\circ\sigma)'\at{\sigma^{-1}(p)} \hint{identity} \\
&= V_{\gamma,p} + V_{\sigma,p}. \hint{definition}
\end{align}
$$

Notice how the chart maps completely canceled out, and so even though our definition of $\psi$ was chart-dependent, its directional derivative was not! We have thus shown that $T_p M$ is closed under vector addition. The proof that it is closed under scalar multiplication is much simpler (it is just reparatmeterization of our curve) and so I will omit it because this proof is already long enough.

Now that we are confident that $T_p M$ is a vector space, we will normally refer to its elements as tangent vectors. Interestingly, it is usually the case that the curve along which the tangent vector is defined is not important. Let's explore why. Suppose $\gamma$ is a curve, $f$ is a smooth function and $p\in M$ is a point on our curve. Then as before,

$$\begin{align}
V_{\gamma, p}f &= (f\circ\gamma)'\at{\gamma^{-1}(p)} \\
&= (f\circ (x^{-1}\circ x)\circ\gamma)'\at{\gamma^{-1}(p)} \\
&= ((f\circ x^{-1})\circ (x\circ\gamma))'\at{\gamma^{-1}(p)}.
\end{align}$$

To make our lives easier, let's make the temporary substitutions
$$\begin{align}
F &= f\circ x^{-1}, \\
\Gamma &= x\circ\gamma.
\end{align}$$

Now $F:\R^n\to\R$ and $\Gamma:\R\to\R^n$, so we can use the multivariable chain rule to obtain

$$\begin{align}
V_{\gamma, p}f &= (F\circ\Gamma)'\at{\gamma^{-1}(p)} \\
&= \sum_{i=1}^n\Bigg(\partial_i\at{x(p)}F\Bigg)\Bigg((\Gamma^{i})'\at{\gamma^{-1}(p)}\Bigg) \\
&= \sum_{i=1}^n\Bigg(\partial_i\at{x(p)}(f\circ x^{-1})\Bigg)\Bigg((x^i\circ\gamma)'\at{\gamma^{-1}(p)}\Bigg),
\end{align}$$

where $\partial_i$ represents the regular partial derivative with respect to the $i$th component and $\Gamma^i$ and $x^i$ are the $i$th component functions of $\Gamma$ and $x$, respectively.

Now, this certainly doesn't look any nicer than $V_{\gamma, p}f$, so why have we done this? Because we have managed to separate the definition into two chunks, one of which depends on the curve $\gamma$ and one which doesn't. The part which doesn't, $\partial_i\at{x(p)}(f\circ x^{-1})$, is so special that it deserves its own name.

Definition. Let $M$ be a smooth manifold, $f\in C^\infty(M)$ a smooth function, and $(U,x)$ a chart containing $p\in M$. Then the $i$th partial derivative operator with respect to the chart map $x$ is the map: $$\frac{\partial}{\partial x^i}\at{p}:C^\infty(M)\to\R$$ defined by $$\frac{\partial}{\partial x^i}\at{p}f=\partial_i(f\circ x^{-1})\at{x(p)}.$$

With this new definition, we see that

$$V_{\gamma, p}f = \sum_{i=1}^n V^i \frac{\partial}{\partial x^i}\at{p}f$$

where the $V^i\in\R$ are the curve-dependent parts which we don't care about. This means that the partial derivative operators $\Big(\dfrac{\partial}{\partial x^i}\at{p}f\Big)_{i=1}^n$ span $T_p M$. Furthermore, it is easy to show that they are linearly independent. It follows that $\Big(\dfrac{\partial}{\partial x^i}\at{p}f\Big)_{i=1}^n$ is a basis for the tangent space $T_p M$. From this, we see that $\dim T_p M = \dim M$. That is, the dimension of the tangent space is the same as the dimension of the original manifold!

The pushforward gives us a way to take smooth maps between manifolds and translate them into maps between tangent spaces on those manifolds. Here's the definition:

Definition. Let $M$ and $N$ be smooth manifolds with $p\in M$, and let $\phi:M\to N$ be a smooth map. The pushforward of $\phi$ is the map

$$\phi_*:T_p M\to T_{\phi(p)} N$$

defined by

$$(\phi_*X)f=X(f\circ\phi)$$

for any tangent vector $X\in T_p M$ and any smooth function $f\in C^\infty(N)$.

Let's try to make sense of this definition. The pushforward turns the smooth map $\phi$ into a map $\phi_*$ between $T_p M$ and $T_{\phi(p)} N$, so it takes a tangent vector in $M$ based at the point $p$ to a tangent vector in $N$ based at the point $\phi(p)$. But tangent vectors are defined by how they act on smooth functions, so we defined the pushforward by how the tangent vector $\phi_*X\in T_{\phi(p)} N$ acts on smooth functions $f\in C^\infty(N)$.

Since the pushforward is a map between vector spaces, it makes sense to ask whether it is a linear map. The answer is yes!

Theorem. The pushforward is a linear map.

Proof. Let $M$ and $N$ be smooth manifolds with $p\in M$, and let $\phi:M\to N$ be a smooth map. We argue first that the pushforward $\phi_*$ is additive. So suppose $X$ and $Y$ are tangent vectors in $T_pM$. Then

$$\begin{align}
\phi_*(X+Y)f &= (X+Y)(f\circ\phi) \\
&= X(f\circ\phi) + Y(f\circ\phi) \\
&= (\phi_*X)f + (\phi_*Y)f
\end{align}$$

for any smooth function $f:N\to\R$. Next, we argue that $\phi_*$ is homogeneous. So suppose $X$ is a tangent vector in $T_pM$ and $a\in\R$. Then

$$\begin{align}
\phi_*(aX)f &= (aX)(f\circ\phi) \\
&= aX(f\circ\phi)\\
&= a(\phi_*X)f.
\end{align}$$

Not only is the pushforward a linear map, it also behaves exactly as we would like it to on tangent vectors. That is, if $V_{\gamma, p}$ is a directional derivative operator (tangent vector) along a curve $\gamma$ then the pushforward $\phi_*$ takes this to the directional derivative operator along the curve $\phi\circ\gamma$ in $N$.

pushforward-1

The following theorem verifies that this is actually what happens.

Theorem. Let $M$ and $N$ be smooth manifolds with $p\in M$, and let $\phi:M\to N$ be a smooth map. Then

$$\phi_* V_{\gamma,p} = V_{\phi\circ\gamma, \phi(p)}.$$

Proof. The result follows from the following computation:

$$\begin{align}
(\phi_* V_{\gamma,p})f &= V_{\gamma,p}(f\circ\phi) \\
&= \big((f\circ\phi)\circ\gamma\big)'\at{\gamma^{-1}(p)} \\
&= \big(f\circ(\phi\circ\gamma)\big)'\at{\gamma^{-1}(p)} \\
&= \big(f\circ(\phi\circ\gamma)\big)'\at{(\gamma^{-1}\circ\phi^{-1})(\phi(p))} \\
&= \big(f\circ(\phi\circ\gamma)\big)'\at{(\phi\circ\gamma)^{-1}(\phi(p))} \\
&= V_{\phi\circ\gamma, \phi(p)}f.
\end{align}$$

This post is only half of the picture. Next time I'll talk about the cotangent space and the pullback!

]]>
<![CDATA[Smooth Manifolds]]>https://algebrology.github.io/smooth-manifolds/5e3f2249d75ec900eff3ba11Mon, 10 Feb 2020 19:43:54 GMT

Disclaimer: This post requires a basic grasp of differential calculus. You don't really need to know how to calculate them, but you do need to be comfortable with the idea of derivatives and partial derivatives, and in particular we will need that the composition of two differentiable maps is differentiable (via the chain rule).

Smooth manifolds are essentially the most general objects that we can do calculus on. Before we can define smooth manifolds, we need to know what a general topological manifold is. And before we can do that, we need just a few new set-theoretic and topological notions.

Definition. A set $X$ is countable if there is a bijection $f:X\to\Z$. That is, if it can be put into one-to-one correspondence with the integers. Finite sets are also considered to be countable.

So countable sets include any finite set, the set $\Z$ itself, the rational numbers $\Q$, and anything else enumerable. Uncountable sets include the real numbers $\R$, the complex numbers $\C$, and many more.

Recall that a basis for a topological space is a collection of open sets which may be used to generate the topology by taking unions of basis elements.

Definition. A topological space is second countable if it has a countable basis.

Example. The set $\R$ with the standard topology is second countable, since ${(a, b)\subseteq\R\mid a, b\in \Q}$, the set of open intervals with rational endpoints, is a basis for $\R$. I won't prove it, but we can express any two real numbers $x<y$ as a strictly decreasing and a strictly increasing Cauchy sequence of rational numbers, respectively. The union of open intervals with the elements of these sequences as rational endpoints will be an open interval with $x$ and $y$ as real endpoints.

The next definition is not terribly important in its own right, but it will make the rest of the notation in this post considerably cleaner.

Definition. Consider topological spaces $X$ and $Y$ and a function $f:X\to Y$. If the function $\hat{f}:X\to f[X]$ with $\hat{f}(x)=f(x)$ for every $x\in X$ is a homeomorphism, then $f$ is a topological embedding or a homeomorphism onto its image.

Next comes what we shall soon see is the most important property of a topological manifold:

Definition. A topological space $X$ is locally Euclidean if there is some $n\in N$ for which every point $p\in X$ has a neighborhood $U\subseteq X$ which is homeomorphic to an open subset of $\R^n$.

Essentially the definition says that every point of a locally Euclidean space is contained in an open set that looks like $\R^n$. We'll talk more about this in a few minutes so I won't say too much about it yet.

We're now ready to define a topological manifold!

Definition. A topological manifold is a topological space $M$ with the following properties:

  1. $M$ is Hausdorff,
  2. $M$ is second countable,
  3. $M$ is locally Euclidean.

The number $n$ for which $M$ is locally Euclidean is called the dimension of the manifold, written $\dim M$.

The Hausdorff and second countable conditions are important, as we shall see in later posts, but the third property is by far the most important, because it means we can treat sections of the manifold as though they were our old friend $\R^n$.

Example. The space $\R^n$ with the standard topology is trivially a topological manifold. In a previous post I showed that any metric space (and thus $\R^n$ in particular) is Hausdorff. I showed earlier in this post that $\R$ is second countable, and that argument extends easily to $\R^n$. All that remains is to show that $\R^n$ is locally Euclidean. This isn't too difficult. For any point $p\in\R^n$, any neighborhood $U$ of $p$ is trivially homeomorphic to itself under the identity map $i:\R^n\to\R^n$.

Example. A less trivial example is given by the circle $S^1$ with the subspace topology from $\R^2$. The circle is not homeomorphic to $\R$, since any point of $\R$ is a cutpoint but no point of $S^1$ is a cutpoint (see my post on connectedness if this is unfamiliar). However, $S^1$ is the next best thing — it is a one-dimensional topological manifold.

I will not show that $S^1$ is Hausdorff or second countable since these properties are easily proved and not incredibly interesting. We can show that it is locally Euclidean by exhibiting two open sets which cover $S^1$ and which are both homeomorphic to open intervals in $\R$.

charts-on-circle

By simply choosing two overlapping open sets $U$ and $V$ in $S^1$ as above, with embeddings $x:U\to\R$ and $y:V\to\R$, we can talk about patches of the circle as though they were $\R$ itself. We think of $x$ and $y$ as providing "local coordinates" for the regions $U$ and $V$. That is to say, each point $p\in U$ corresponds to precisely one real number in $\R$ — its local coordinate $x(p)$— and likewise for points in $V$. Since $U$ and $V$ must intersect, a point in their intersection will most likely have a different local coordinate under the map $y$ than under the map $x$.

Example. The torus $S^1\times S^1$ with the product topology (which is equivalent to the subspace topology from $\R^3$) is a topological manifold. It is not difficult to exhibit open sets that cover the torus and embeddings from those sets to the real numbers, but I don't want to get bogged down with technical details so I will not bother. You can look up ways to do it if you're interested!

Example. The subset of $\R^2$ below, considered with the subspace topology, is not a topological manifold.

not-a-manifold

The issue is the point where the lines all intersect. All other points have neighborhoods homeomorphic to the real line, but no neighborhood of that point does. If we remove that point, we get a topological manifold, but it is no longer connected.

In the case of the circle above, we had two open sets $U$ and $V$ that were homeomorphic to $\R$. In general, a manifold will have many such open sets and so it is convenient to have a name for them (and for the homeomorphisms).

Definition. A chart $(U, x)$ of a topological manifold $M$ of dimension $n$ consists of an open set $U\subseteq M$ and a topological embedding $x:M\to\R^n$. The set $U$ is often called the chart domain and the embedding $x$ is often called the chart map.

By definition, any nonempty topological manifold must have at least one chart.

Definition. An atlas for a topological manifold $M$ is a collection of charts whose domains cover $M$.

Definition. If $(U, x)$ and $(V, y)$ are charts of a manifold $M$ of dimension $n$ with $U\cap V\ne\varnothing$ then the functions $x\circ y^{-1}:y[U\cap V]\to x[U\cap V]$ and $y\circ x^{-1}:x[U\cap V]\to y[U\cap V]$ are called transition maps between the charts.

Note that transition maps are homeomorphisms because they are compositions of restrictions/inverses of topological embeddings.

In general a manifold will have many atlases. For instance, in the above example for the circle $S^1$, the collection ${(U, x), (V, y)}$ was an atlas. But we could have used different open sets, as long as they covered $S^1$. We could also add more charts and the result will still be an atlas.

Definition. An atlas for a topological manifold $M$ is a maximal atlas if it is not strictly contained in any other atlas for $M$.

Every topological manifold has a maximal atlas. This isn't super important now, but it will be in a moment.

That's it. That's all there is to topological manifolds (not really). Let's move on to smooth manifolds!

Since topological manifolds are topological spaces, it makes sense to talk about continuous maps $\R\to M$ and $M\to N$, where $M$ and $N$ are topological manifolds. The endgame of this post is to be able to take this even further and talk about smooth maps $M\to \R$ and $M\to N$. First we must define what we mean by a smooth map between Euclidean spaces:

Definition. If $U\subseteq R^n$ and $V\subseteq R^m$ are open, a map $f:U\to V$ is called:

  • $C^0$ if $f$ is continuous,
  • $C^1$ if $f$ is continuously differentiable (that is, its derivative is continuous),
  • $C^k$ if $f$ is $k$-times continuously differentiable,
  • $C^\infty$ or smooth if $f$ is infinitely differentiable.

We shall be mainly interested in the smooth ($C^\infty$) case, but everything that follows has an analogue for $C^k$ functions.

Consider a function $f:M\to\R$ defined on a manifold $M$. I like to imagine such a function as assigning a temperature to each point on the manifold, but think about it however best suits you. We would like to come up with a way to extend the above concepts of differentiability and smoothness to such functions. Unlike continuity, however, differentiability is not a topological notion. To see this, note that the circle and the square are homeomorphic spaces, but the circle is intuitively smooth while the square is not (it has sharp corners where it ought not be differentiable). To define a notion of differentiability on a manifold, we have to get a little bit creative.

Luckily, we have an atlas of charts that let us talk about regions of the manifold as though they were regions of $\R^n$. It may not be immediately obvious, but this lets us treat chunks of our function $f:M\to\R$ as though they were functions $\R^n\to\R$. To see this more clearly, we make the following definition.

Definition. Let $M$ be a manifold of dimension $n$ and let $f:M\to\R$ be a function. If $(U, x)$ is a chart of $M$, we call $f\circ x^{-1}:x[U]\to\R$ the chart representative of $f$ under this chart.

Here's a visualization of what the chart representative of a function $f:M\to\R$ might look like. If we are going with my "temperature distribution" interpretation, blue might represent "colder" regions of $M$ and red might represent "warmer" regions. Since $M$ is an $n$-dimensional manifold, every point $p\in M$ is contained in the domain of some chart $(U, x)$. If we look at the image of $U$ under the chart map $x$, we obtain a region $x[U]$ in $\R^n$ to which we can assign familiar coordinates, and which gives us a pretty reasonable representation of what the function $f$ looks like on the domain $U$.

chart-representative

Remember that the chart map $x$ is necessarily an embedding and thus has an inverse $x^{-1}:x[U]\to U$ which is also a homeomorphism. The chart representative of $f$ under the chart $(U, x)$ is then the function $f\circ x^{-1}:x[U]\to\R$. It is crucial to notice that for any point $p\in U$, we have that $f(p) = (f\circ x^{-1})(x(p))$. This means that the chart representative $f\circ x^{-1}$ really contains all the same data that $f$ does on the chart domain $U$. Furthermore, since $f\circ x^{-1}$ takes an open subset of $\R^n$ to an open subset of $\R$, it is an object on which differentiability and smoothness make sense to talk about!

Here's the idea then: we will try to define a function $f:M\to\R$ as smooth if every point $p\in M$ is contained in the domain of some chart $(U, x)$ for which the transition map $f\circ x^{-1}$ is smooth (i.e., $C^\infty$) in the sense of ordinary differential calculus. We need to be careful, though, since $p$ might be in the domain of two or more charts. It would be no good at all if the transition map with respect to one chart was smooth but the transition map with respect to another chart was not. In order for smoothness of $f$ to be a well-defined notion then, we need to make sure that it is not dependent on our choice of chart!

Let's see what this requirement entails exactly. Suppose a point $p\in M$ is in the domain of two charts, $(U,x)$ and $(V, y)$. Then $U\cap V\subseteq M$ is nonempty, since it certainly contains $p$, and it is an open set since it is the intersection of two open sets.

smooth-function-well-defined-1

What we need is for smoothness of the chart representative $f\circ x^{-1}$ to imply smoothness of the chart representative $f\circ y^{-1}$. So suppose the chart representative $f\circ x^{-1}$ is smooth. Then, remembering that the composition of smooth functions is smooth and that composition of functions is associative, the other chart representative
$$
\begin{align}f\circ y^{-1} &= f\circ (x^{-1}\circ x) \circ y^{-1} \\ &= (f\circ x^{-1})\circ(x\circ y^{-1})\end{align}
$$ is guaranteed to be smooth only if the transition map $x\circ y^{-1}$ is smooth. Thus, we have failed in defining smooth functions on topological manifolds, since there is nothing in their definition which requires this.

In order to make all of this work, we simply define a new notion:

Definition. A topological manifold $M$ is a smooth manifold if for every pair of charts $(U, x)$ and $(V,y)$ in its atlas, the transition map $x\circ y^{-1}$ is smooth.

We call such an atlas a smooth atlas and we say that such charts are smoothly compatible.

Unlike a general topological manifold, which is guaranteed to have a perfectly fine atlas by definition, a smooth manifold needs to come with a specified smooth atlas. It is this special choice of atlas which makes a manifold smooth.

In restricting the types of manifolds we consider to smooth manifolds, we have solved the issue above and we can immediately define smooth functions on smooth manifolds!

Definition. Let $M$ be a smooth manifold. A function $f:M\to\R$ is a smooth function if every point $p\in M$ is contained in the domain of some chart $(U, x)$ for which the chart representative $f\circ x^{-1}$ is smooth in the sense of ordinary differential calculus.

We denote by $C^\infty(M)$ the collection of all smooth functions on $M$.

The argument above proves that this definition is well defined on smooth manifolds, because if a function's transition map with respect to one chart is smooth, then all its transition maps in overlapping charts are guaranteed to be smooth by construction.

Let me reiterate, just to really drive the point home. There is no way to talk about smooth functions on a general topological manifold. Only if we restrict our atlas to a smooth atlas in which all charts are smoothly compatible can we define such a concept. Smooth manifolds are the primary object of study in differential geometry, and are an essential ingredient in general relativity (spacetime is assumed to be a smooth manifold) and other branches of physics.

In addition to being able to talk about smooth maps on a manifold, we now have all the required machinery to define smooth maps between two smooth manifolds. Suppose $f:M\to N$ is such a map between smooth manifolds $M$ of dimension $m$ and $N$ of dimension $n$. Then there are necessarily charts $(U_M, x_M)$ for $M$ and $(U_N, x_N)$ for $N$ for which $p\in U_M$ and $f(p)\in U_N$. We don't know how to differentiate $f$, but we can follow the same basic procedure we used above to define smooth functions.

smooth-map-idea

Observe that $x_N\circ f\circ x_M^{-1}$ is a function between open subsets of $\R^m$ and $\R^n$, and thus it is an object on which it is reasonable to talk about differentiability and smoothness in the ordinary vector-calculus sense. Furthermore, for any point $p\in U_M$ we have that
$$f(p) = x_N^{-1}\circ (x_N\circ f\circ x_M^{-1})(x_M(p)).$$ Thus, all the behavior of the function $f$ on the subsets $U_M$ and $U_N$ is completely recoverable from the function $x_N\circ f\circ x_M^{-1}$, so it makes sense to talk about it as a sort of representative of $f$ with respect to these charts.

Definition. Let $M$ and $N$ be smooth manifolds. A map $f:M\to N$ is a smooth map if for every point $p\in M$ there is a chart $(U_M, x_M)$ for $M$ whose domain contains $p$ and a chart $(U_N, x_N)$ whose domain contains $f(p)$ and for which $x_N\circ f\circ x_M^{-1}$ is smooth in the sense of ordinary differential calculus.

We still need to check that the concept of a smooth map is well defined. For instance, it's possible that $p$ is contained in two charts whose intersection is $U_M$ and $f(p)$ is contained in two charts whose intersection is $U_N$.

smooth-map-well-defined

Suppose $x_N\circ f\circ x_M^{-1}$ is smooth. Then
$$
\begin{align}y_N\circ f\circ y_M^{-1} &= y_N\circ (x_N^{-1}\circ x_N)\circ f\circ (x_M^{-1}\circ x_M)\circ y_M^{-1} \\
&= (y_N\circ x_N^{-1})\circ (x_N\circ f\circ x_M^{-1})\circ (x_M\circ y_M^{-1})\end{align}
$$ is smooth because the transition maps $y_N\circ x_N^{-1}$ and $x_M\circ y_M^{-1}$ are smooth by the definition of a smooth manifold. Thus, the concept of smoothness of a map $f:M\to N$ is independent of the charts that we choose, and so it is a well defined notion.

I'm going to stop here for now, but in my next post we'll see how we can define tangent spaces to points on a manifold, which are a generalization of tangent hyperplanes to hypersurfaces in $\R^n$.

]]>
<![CDATA[Permutations and Parity]]>

In my next post, I would like to introduce a very special type of tensor whose properties are invaluable in many fields, most notably differential geometry. Although it's possible to understand antisymmetric tensors without discussing permutations and their parity, these concepts are invaluable to developing the theory properly. Thus, in

]]>
https://algebrology.github.io/permutations-and-parity/5dcb71b35701dd0be918a9c8Wed, 13 Nov 2019 03:00:15 GMT

In my next post, I would like to introduce a very special type of tensor whose properties are invaluable in many fields, most notably differential geometry. Although it's possible to understand antisymmetric tensors without discussing permutations and their parity, these concepts are invaluable to developing the theory properly. Thus, in this post we will take a brief voyage into the topic of permutations.

Definition. A permutation on a finite set $X$ is a bijective function $p:X\to X$.

That seems simple enough. Basically, permutations offer us a way of rearranging the order of elements in a set. To see what I mean by this, let's look at a few examples.

Example. Let $\N_4$ denote the set $\{0, 1, 2, 3\}$ containing the first four natural numbers. The function $p_1:\N_4 \to \N_4$, defined by $p_1(x) = x + 1\hspace{-4pt}\mod 4$, is a permutation. See how the permutation $p_1$ rearranges the order of these numbers:

$$
\begin{align}
0 &\mapsto 1 \\
1 &\mapsto 2 \\
2 &\mapsto 3 \\
3 &\mapsto 0.
\end{align}
$$

Example. Not every permutation is as easy to write an equation to describe. For example, let $X$ denote the set $\{a, b, c, d, e, f, g\}$ . Then there exists a bijection $p_2$ which permutes these letters as follows:

$$
\begin{align}
a &\mapsto a \\
b &\mapsto c \\
c &\mapsto b \\
d &\mapsto e \\
e &\mapsto g \\
f &\mapsto d \\
g &\mapsto f.
\end{align}
$$

However, there's no obvious way to describe this permutation other than to simply write out what happens to each individual letter.

Because we often need to describe permutations that have no simple corresponding equation, it is common to adopt the following compact notation to represent permutations:

Notation. Let $p:X\to X$ denote a permutation on some finite set $X=\{x_0, x_1, x_2, \ldots, x_n\}$. Then we can write

$$
p = \begin{pmatrix}
x_0 & x_1 & x_2 & \ldots & x_n \\
p(x_0) & p(x_1) & p(x_2) & \ldots & p(x_n)
\end{pmatrix}
$$

as a shorthand to describe the images of each element under $p$.

Using this new notation, the permutation from the first example above can be written

$$
p_1 = \begin{pmatrix}
0 & 1 & 2 & 3 \\
1 & 2 & 3 & 0
\end{pmatrix},
$$

and the one from the second example becomes

$$
p_2 = \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & e & g & d & f
\end{pmatrix}.
$$

This second example has a few interesting properties. Note that $a$ gets sent to $a$ and is thus a fixed point of the permutation. The letters $b$ and $c$ get swapped, and $d, e, f$ and $g$ all get permuted among themselves. What this means is that the action of $a$ is entirely unrelated to the actions of $b$ and $c$, which are entirely unrelated to the actions of $d, e, f$ and $g$.

This observation that certain subsets behave completely independently of each other suggests that there may be a way to break the permutation into more fundamental building blocks.

Before we delve into how this is done, it is important to realize that, since permutations are just functions, they can be composed with each other. The following easily-proved result tells us that the composition of permutations is always itself a permutation.

Theorem. Let $p:A\to B$ and $q:B\to C$ be bijective functions. Then $q\circ p:A\to C$ is also bijective.

Proof. Since $p$ and $q$ are bijections, they are by definition injective and surjective. To show that $q\circ p$ is bijective, we must show that it is also injective and surjective.

We argue first that $p\circ q$ is injective. To this end, suppose that $a_1, a_2\in A$ and that $a_1\ne a_2$. Then because $p$ is injective, we know that $p(a_1)\ne p(a_2)$. Furthermore, since $q$ is injective, we know that

$$
\begin{align}
(q\circ p)(a_1) &= q(p(a_1)) \\
&\ne q(p(a_2)) \\
&= (q\circ p)(a_2).
\end{align}
$$

We have shown that distinct elements of $A$ get mapped to distinct elements of $C$, so it follows that $q\circ p$ is injective.

It remains to be shown that $q\circ p$ is surjective. For any element $c\in C$, the surjectivity of $q$ tells us that there exists some element $b\in B$ for $c = q(b)$. It follows then from the surjectivity of $p$ that there exists some $a\in A$ for which $b=p(a)$. That is,

$$
\begin{align}
c &= q(p(a)) \\
&= (q\circ p)(a).
\end{align}
$$

We have shown that every element of $C$ gets mapped to by some element of $A$, so it follows that $q\circ p$ is surjective.

Since $q\circ p$ is injective and surjective, it is bijective.

This theorem is actually much more general than what we need, since permutations are very specific bijections which take finite sets to themselves. But as a special case, it certainly tells us that the composition of permutations is a permutation.

Notation. For convenience, we frequently omit the $\circ$ symbol and treat composition of permutations similarly to multiplication. That is, we simply write $qp$ instead of $q\circ p$. In particular, this allows us to use exponential notation to write

$$
p^n = \underbrace{p\circ p\circ \cdots \circ p}_{n \text{ times}}.
$$

We also define $p^0$ to be the identity function, and $p^{-n}$ to be $(p^{-1})^n$. Note that $p^{-1}$ exists and is guaranteed to be a permutation since $p$ is bijective. With these definitions, it is easy to see that, for any integers $n$ and $m$,

$$
p^{n+m} = p^n p^m.
$$

Equipped with our new machinery, we can begin to see how a permutation can be broken up into more basic building blocks.

Example. Consider the permutations

$$
q_1 = \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & d & e & f & g
\end{pmatrix}
$$

and

$$
q_2 = \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & e & g & d & f
\end{pmatrix}.
$$

Then their composition is

$$
\begin{align}
q_2 \circ q_1 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & e & g & d & f
\end{pmatrix} \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & d & e & f & g
\end{pmatrix} \\
&= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & e & g & d & f
\end{pmatrix} \\
&= p_2.
\end{align}
$$

This gives us an explicit way to express the permutation $p_2$ as the composition of a permutation $q_1$, which affects only the elements $b$ and $c$, and the permutation $q_2$, which affects only the elements $d, e, f, g$.

It is generally possible to break up any permutation this way. We are now ready to see how it is done.

Theorem. Let $p:X\to X$ be a permutation. Then the relation $\sim$ on $X$, defined by $a\sim b$ whenever $b=p^n(a)$ for some $n\in\Z$, is an equivalence relation.

Proof. By the definition of equivalence relation, it suffices to show that $\sim$ is reflexive, symmetric and transitive.

To see that $\sim$ is reflexive, it suffices to choose $n=0$, as it is certainly true that $a = p^0(a)$ for any $a\in X$. (Recall that $p^0$ is the identity map.) Thus, $a\sim a$ for every $a\in X$.

To see that it is symmetric, suppose $a\sim b$ for some elements $a,b\in X$. Then by definition, $b=p^n(a)$ for some integer $n$. Acting on both sides with the permutation $p^{-n}$ yields

$$
\begin{align}
p^{-n}(b) &= p^{-n}(p^n(a)) \\
&= p^0(a) \\
&= a.
\end{align}
$$

It follows then that $b\sim a$ and thus $\sim$ is symmetric.

Lastly, we need to show that $\sim$ is transitive. To this end, suppose that $a\sim b$ and $b\sim c$, for some elements $a, b, c\in X$. Then there exist integers $n_1, n_2$ for which $b=p^{n_1}(a)$ and $c=p^{n_2}(b)$. Then

$$
\begin{align}
c &= p^{n_2}(b) \\
&= p^{n_2}(p^{n_1}(a)) \\
&= (p^{n_2}p^{n_1})(a) \\
&= p^{n_1 + n_2}(a).
\end{align}
$$

Since $n_1+n_2$ is an integer, it follows that $c\sim a$ and thus $\sim$ is transitive.

We have shown that $\sim$ is reflexive, symmetric and transitive. Therefore it is an equivalence relation.

The following definition makes this all easier to talk about.

Definition. Let $p:X\to X$ denote a permutation on a finite set $X$, and let $\sim$ be the equivalence relation on $X$ defined in the previous theorem. Then the equivalence classes of $\sim$ are called orbits of the permutation $p$.

Let's break this all down. Basically an orbit of a permutation is a collection of elements that are all reachable from each other under repeat application of that permutation. That is, if $x$ and $y$ are in the same orbit of some permutation, then applying the permutation to $x$ enough times will eventually get you to $y$.

In the case of the permutation $p_2$ defined earlier in the post, the orbits are the sets $\{a\}$, $\{b, c\}$ and $\{d, e, f, g\}$. Our earlier assertion that these sets behaved independently of each other under the permutation $p_2$ is now formalized by the realization that orbits must partition the entire set. But remember that we were actually able to use this fact to find permutations $q_1$ and $q_2$ which acted on these orbits and whose composition was $p_2$. Let's see if we can make this type of decomposition more rigorous as well.

Definition. A permutation $p:X\to X$ is a cycle if at most one of its orbits contains more than a single element. Two cycles are called disjoint if their non-singleton orbits are disjoint sets. The length of a cycle is the size of its largest orbit.

It is not difficult to see that $q_1$ and $q_2$ are cycles. The orbits of $q_1$ are $\{a\}$, $\{b, c\}$, $\{d\}$, $\{e\}$, $\{f\}$ and $\{g\}$. Note that only one of these orbits, namely $\{b,c\}$, has upwards of one element, and so $q_1$ is a cycle. Similarly, the orbits of $q_2$ are $\{a\}$, $\{b\}$, $\{c\}$ and $\{d, e, f, g\}$. Since only this last orbit has more than one element, $q_2$ is a cycle as well. Additionally, the cycles $q_1$ and $q_2$ are disjoint because their non-singleton orbits are disjoint.

This implies that what we are really looking for is a way to decompose any permutation into a product of disjoint cycles. The next theorem tells us that this can always be done.

Theorem. Any permutation $p:X\to X$ can be written as the composition of disjoint cycles.

Proof. Let $O_1, O_2, \ldots, O_n$ denote the orbits of $p$. (There are, of course, finitely many orbits because $X$ must be a finite set.) Then for each $1\le i \le n$ and each $x\in X$, define

$$
c_i(x) = \begin{cases}
p(x) & \text{if $x\in O_i$}, \\
x & \text{otherwise}.
\end{cases}
$$

Then each $c_i$ is a cycle because, by construction, its only non-singleton orbit is $O_i$. Since the orbits are disjoint, this in turn means that these are all disjoint cycles.

We argue that $p=c_1c_2\cdots c_n$. To this end, choose any $a\in X$. Then at most one orbit of $p$, say $O_j$, contains $a$. This means that $c_j(a)=p(a)$ and $c_i(a) = a$ for every $i\ne j$. That is, the restriction $c_i\mid_{\{a\}}$is the identity map $i$. It follows then that

$$
\begin{align}
(c_1 c_2 \cdots c_j \cdots c_n)(a) &= \big(c_1\mid_{\{a\}} c_2\mid_{\{a\}} \cdots c_j \cdots c_n\mid_{\{a\}}\big)(a) \\
&= \big(i^{j-1} c_j i^{n-j}\big)(a) \\
&= c_j(a) \\
&= p(a),
\end{align}
$$

as desired.

This type of decomposition is even unique! That is, there is one such collection of disjoint cycles whose product is our desired permutation. The order in which they are multiplied is not unique, however.

Disjoint cycles have another nice property: they commute, meaning that it does not matter in which order we apply them.

Theorem. If $p:X\to X$ and $q:X\to X$ are disjoint cycles, then $pq=qp$.

Proof. Since $p$ and $q$ are disjoint, then their non-singleton orbits, $O_p=\{a_0, a_2, \ldots, a_{n-1}\}$ and $O_q=\{b_0, b_2, \ldots, b_{m-1}\}$ are disjoint. Let $C$ be the set of elements in $X$ which are fixed points of both $p$ and $q$. Then $X=O_p \cup O_q \cup C$ and these sets are all pairwise disjoint. It thus suffices to show that $(pq)(x)=(qp)(x)$ for any of these cases: $x\in O_p$, $x\in O_q$ and $x\in C$.

Suppose first that $x\in O_p$. Then $x=a_i$ for some $0\le i< n$. Note that $p(a_i)=a_{i+1\hspace{-2pt}\mod n}$ and $q(a_i)=a_i$ since $a_i$ is by definition in a singleton orbit of $q$. Thus,

$$
\begin{align}
(pq)(x) &= (pq)(a_i) \\
&= p(q(a_i)) \\
&= p(a_i) \\
&= a_{i+1\hspace{-2pt}\mod n} \\
&= q(a_{i+1\hspace{-2pt}\mod n}) \\
&= q(p(a_i)) \\
&= (qp)(a_i).
\end{align}
$$

An entirely symmetrical argument holds for the case where $x\in O_q$, noting that $x=b_j$ for some $0\le j< m$ and that $q(b_j)=b_{j+1\hspace{-2pt}\mod m}$ and $p(b_j) = b_j$ since $b_j$ is a fixed point of $p$.

Lastly, if $x\in C$ then it is a fixed point of both $p$ and $q$. That is, $p(x) = q(x) = x$. Thus,

$$
\begin{align}
(pq)(x) &= p(q(x)) \\
&= p(x) \\
&= x \\
&= q(x) \\
&= q(p(x)) \\
&= (qp)(x).
\end{align}
$$

It follows that $(pq)(x)=(qp)(x)$ for every $x\in X$, and thus $pq=qp$ as desired.

If we do not demand that our cycles be disjoint, there are even more ways to decompose a permutation into products of cycles. As an example, consider the following definitions:

$$
\begin{align}
r_1 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & d & e & f & g
\end{pmatrix}, \\
r_2 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & e & d & f & g
\end{pmatrix}, \\
r_3 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & d & g & f & e
\end{pmatrix}, \\
r_4 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & d & e & g & f
\end{pmatrix}.
\end{align}
$$

These are all permutations which each swap only two elements. They are cycles because each has only one non-singleton orbit. Those orbits are $\{b, c\}$, $\{d, e\}$, $\{e, g\}$ and $\{f, g\}$, respectively. These orbits are not disjoint, and thus neither are are the cycles. But it turns out that $r_4r_3r_2r_1=p_2$. To see this, the following flowchart might be a helpful visual aid, where we apply first $r_1$, then $r_2$, then $r_3$ and finally $r_4$ from top to bottom:

$$
\begin{matrix}
a & b & c & d & e & f & g \\
&&& \downarrow \\
a & c & b & d & e & f & g \\
&&& \downarrow \\
a & c & b & e & d & f & g \\
&&& \downarrow \\
a & c & b & e & g & f & d \\
&&& \downarrow \\
a & c & b & e & g & d & f
\end{matrix}
$$

Since we've now shown two ways of decomposing $p_2$ into cycles, it's obvious that such a decomposition is not unique unless we insist that our cycles are disjoint. There is still useful information to be extracted from such decompositions, however, particularly decompositions of the form above.

Definition. A transposition is a cycle of length $2$.

In other words, a transposition is a permutation which swaps only two elements. This means, for example, that $r_1, r_2, r_3$ and $r_4$ as defined above are all transpositions. The following theorem, though perhaps obvious, is surprisingly important. Before we prove it, it is helpful to introduce a new notation for cycles:

Notation. Let $c:X\to X$ denote a cycle. If its largest orbit sends $x_1 \mapsto x_2 \mapsto \cdots \mapsto x_n$ then we write

$$
c = \begin{pmatrix}
x_1 & x_2 & \cdots & x_n
\end{pmatrix}.
$$

This allows us to simplify our representations a great deal, as long as we know the permutations we're talking about are all cycles. For instance, it allows us to write

$$
\begin{align}
q_1 &= \begin{pmatrix}b & c\end{pmatrix}, \\
q_2 &= \begin{pmatrix}d & e & g & f\end{pmatrix}, \\
r_1 &= \begin{pmatrix}b & c\end{pmatrix}, \\
r_2 &= \begin{pmatrix}d & e\end{pmatrix}, \\
r_3 &= \begin{pmatrix}e & g\end{pmatrix}, \\
r_4 &= \begin{pmatrix}f & g\end{pmatrix}.
\end{align}
$$

Thus,

$$
\begin{align}
p_2 &= \begin{pmatrix}d & e & g & f\end{pmatrix}\begin{pmatrix}b & c\end{pmatrix} \\
&= \begin{pmatrix}f & g\end{pmatrix}\begin{pmatrix}e & g\end{pmatrix}\begin{pmatrix}d & e\end{pmatrix}\begin{pmatrix}b & c\end{pmatrix}.
\end{align}
$$

Now we're ready to prove that theorem!

Theorem. Any permutation $p:X\to X$, where $X$ is a finite set containing at least two elements, can be expressed as a product of transpositions.

Proof. If $p$ is the identity permutation, then the empty product yields p. Otherwise, we can write $p$ as a finite product of cycles, $p=c_1c_2\cdots c_n$. If we can decompose each cycle $c_i$ into a product of transpositions, the result will follow.

Each cycle $c_i$ can be written $c_i=\begin{pmatrix} a_1 & a_2 & \cdots & a_{m_i} \end{pmatrix}$ for some integer $m_i$ and some $\{a_j\}_{j=1}^{m_i}$. We write this cycle as a composition of transpositions as follows

$$
\begin{align}
c_i &= \begin{pmatrix} a_1 & a_2 & \cdots & a_{m_i} \end{pmatrix} \\
&= \begin{pmatrix}a_1 & a_{m_i}\end{pmatrix}\begin{pmatrix}a_1 & a_{m_i-1}\end{pmatrix}\cdots\begin{pmatrix}a_1 & a_2\end{pmatrix}.
\end{align}
$$

Now $p$ is just the product of these cycles, each of which is the product of transpositions, and so $p$ itself is a product of transpositions, completing the proof.

There are many possible such decompositions into transpositions for any given permutation. However, it turns out that such decompositions have one important invariant property. Namely, any permutation's expression as a product of transpositions will always either have an odd number of transpositions or an even number.

For instance, there may be many ways to decompose $p_2$ into a product of transpositions. But we've seen it done using four, and so we can say with certainty that any such decomposition will always have an even number of transpositions. This property is called the parity of a permutation, and it is so important that there is a theorem named after it — the Parity Theorem. Unfortunately the proof is somewhat beyond the scope of this blog post, but maybe I will devote a full post to it at some point. In any case, you should now have a good understanding of what it means. Here is the statement:

Parity Theorem. A permutation $p$ cannot be expressed as both the product of an odd number of transpositions and as the product of an even number of transpositions.

This fact allows us make the following definitions:

Definition. A permutation is odd if it can be expressed as the product of an odd number of transpositions. It is even if it can be expressed as the product of an even number of transpositions.

Definition. If a permutation $\sigma$ is odd, we define its sign to be $\sgn\sigma = -1$. If it is even, its sign is $\sgn\sigma = 1$.

An alternative notation for $\sgn\sigma$ is $(-1)^\sigma$, since the sign is equal to $(-1)^m$ where $m$ is the number of transpositions in the decomposition of $\sigma$.

Every permutation is either even or odd, but it cannot be both.

Lastly, it is important to point out that the set of all permutations on a finite set $X$ forms a group called the symmetric group of $X$, written $\text{Sym}(X)$ or $S_n$, where $n$ is the number of elements in $X$. The permutations are the elements of this group, and the group operation is function composition. Since function composition is associative and permutations have inverses, this clearly does in fact form a group.

I'll leave this here for now, and we will soon put all of this to use!

]]>
<![CDATA[Tensor Products and Multilinear Maps]]>https://algebrology.github.io/tensor-products/5d4dedb22864a0008db11bacTue, 13 Aug 2019 22:06:00 GMT

If you're the sort of person who cowers in fear whenever the word "tensor" is mentioned, this post is for you. We'll pick up right where we left off last time in our discussion of the dual space, and discover how tensor products are a natural extension of the ideas developed in that post.

As before, throughout this post we will assume that $V$ is a finite dimensional vector space over a field $\F$. Furthermore, since there are going to be tons of sums in this post, and they will all have $n=\dim{V}$ terms, I will use the shorthand $\displaystyle\sum_i$ to mean $\displaystyle\sum_{i=1}^n$.

We saw last time that the dual space $\dual{V}$ of a vector space $V$ was another vector space which consisted of all linear maps from $V$ to $\F$. That is, an element of the dual space (called a covector or linear functional) is a map which takes a vector and outputs a scalar.

We also saw that, thanks to the natural isomorphism between a vector space $V$ and its double dual space $\ddual{V}$, we can view a vector as a map which takes a covector and outputs a scalar.

Tensor products give us a way to combine vectors and covectors and build more interesting maps from them. Let's look first at the simplest type of tensor product and work upward from there.

Definition. Given two covectors $s:V\to\F$ and $t:V\to\F$ in $\dual{V}$, their tensor product is the function $s \otimes t:V\times V\to\F$ defined by

$$(s \otimes t)(u, v) = s(u)t(v)$$

for all vectors $u,v\in V$.

We haven't really done anything groundbreaking here. We took two covectors, $s$ and $t$, and combined them to make a new map $s\otimes t$ which takes two vectors, $u$ and $v$, and outputs the product of $s(u)$ and $t(v)$, which is, of course, a scalar.

Example. Suppose $V=\R^2$ and define $s,t:\R^2\to\R$ by

$$\begin{align}
s(x,y) &= x + y \\
t(x,y) &= 2x
\end{align}$$

for any vector $(x,y)$ in $\R^2$.

It is trivial to check that $s$ and $t$ are linear maps, and so they are both covectors in $\dual{V}$. That means we can construct their tensor product as defined above. This will be a map which takes two vectors in $\R^2$ and combines them to output a single real number. Let's see how this works.

We can construct $s\otimes t:\R^2\times\R^2\to\R$, defined by

$$\begin{align}
(s\otimes t)(x_1,y_1,x_2,y_2) &= s(x_1,y_1)t(x_2,y_2) \\
&= (x_1 + y_1) (2x_2) \\
&= 2x_1x_2 + 2y_1x_2,
\end{align}$$

where $(x_1,y_1)$ and $(x_2,y_2)$ are any vectors in $\R^2$.

For concreteness, let's see how it acts on two specific vectors, $(1,2)$ and $(3,4)$.

$$\begin{align}
(s\otimes t)(1,2,3,4) &= 2(1)(3) + 2(2)(3) \\
&= 6 + 12 \\
&= 18.
\end{align}$$

We'd like to understand the properties of tensor products, so the first question we might ask is whether they are linear maps. Unfortunately, this question doesn't really make sense. We've defined linear maps as maps from one vector space to another which satisfy the properties of additivity and homogeneity. The tensor product we've defined instead takes the cartesian product of vector spaces into another vector space.

We can rectify this by creating an extension of the idea of a linear map.

Definition. Given three vector spaces $U,V,W$ over the same field $\F$, a bilinear map is a function $T:U\times V\to W$ which is linear in each argument. That is,

Additivity in the First Argument
For any vectors $u_1,u_2\in U$ and $v\in V$, we have that $$T(u_1+u_2, v) = T(u_1, v) + T(u_2, v).$$

Additivity in the Second Argument
For any vectors $u\in U$ and $v_1, v_2\in V$, we have that $$T(u, v_1 + v_2) = T(u, v_1) + T(u, v_2).$$

Homogeneity in Each Argument
For any vectors $u\in U$, $v\in V$ and any scalar $a\in\F$, we have that $$T(au, v)=T(u, av) = aT(u, v).$$

Just as the set of linear maps from one vector space to another is itself a vector space, it turns out that the set of all bilinear maps also forms a vector space once we make the following natural prescriptions:

  • The zero vector is the zero map ${\bf 0}:V\to F$.
  • Vector addition is just function addition.
  • Scalar multiplication is inherited directly.
  • Additive inverses are given by scalar multiplication by $-1$.

You'll notice that this is exactly what we did to turn the dual space into a vector space, and that's because it's the natural thing to do.

Happily enough, it turns out that the tensor product of two covectors will always be a bilinear map!

Theorem. Given two covectors $s,t\in\dual{V}$, their tensor product $s\otimes t$ is bilinear.

Proof. We need only verify that the properties of a bilinear map hold. To see that $s\otimes t$ is linear in the first argument, note that, from the linearity of $s$,

$$\begin{align}
(s\otimes t)(u_1+u_2, v) &= s(u_1+u_2)t(v) \\
&= \big(s(u_1) + s(u_2)\big)t(v) \\
&= s(u_1)t(v) + s(u_2)t(v) \\
&= (s\otimes t)(u_1, v) + (s\otimes t)(u_2, v)
\end{align}$$

for any vectors $u_1, u_2, v\in V$. Linearity in the second argument holds by a completely analogous argument.

To see that $s\otimes t$ is homogenous in each argument, we note that

$$\begin{align}
(s\otimes t)(au, v) &= s(au)t(v) \\
&= as(u)t(v) \\
&= a(s\otimes t)(u,v) \\
&= s(u)at(v) \\
&= s(u)t(av) \\
&= (s\otimes t)(u, av).
\end{align}$$

So tensor products are actually very well behaved maps! Even more is true though:

Theorem. Let $B$ denote the space of all bilinear maps from $V\times V\to \F$. If $(e_i)_{i=1}^n$ is a basis for $V$ with dual basis $(e^i)_{i=1}^n$ for $\dual{V}$, then $(e^i\otimes e^j)_{i,j=1}^n$ is a basis for $B$.

Proof. We will show first that $(e^i\otimes e^j)_{i,j=1}^n$ is linearly independent. Suppose

$$\sum_i\sum_j a_{ij} e^i \otimes e^j = 0$$

for some scalars $(a_{ij})_{i,j=1}^n$ in $\F$. Then for any two vectors $u=\displaystyle\sum_k u^k e_k$ and $v=\displaystyle\sum_l v^l e_l$ in $V$,

$$\begin{align}
\left(\sum_i\sum_j a_{ij} e^i \otimes e^j\right)\left(\sum_k u^k e_k, \sum_l v^l e_l\right) &= \sum_i\sum_j a_{ij} e^i \left(\sum_k u^k e_k\right) e^j \left(\sum_l v^l e_l\right) \\
&= \sum_i\sum_j a_{ij} \sum_k u^k e^i(e_k) \sum_l v^l e^j(e_l) \\
&= \sum_i\sum_j a_{ij} \sum_k u^k \delta^i_k \sum_l v^l \delta^j_l \\
&= \sum_i\sum_k a_{ij} u^i v^j \\
&= 0.
\end{align}$$

But since $u$ and $v$ were arbitrary, $u^i$ and $v^j$ are arbitrary and thus the above can only be certain if $a_{ij}=0$ for all $i$ and $j$. Thus, $(e^i\otimes e^j)_{i,j=1}^n$ is linearly independent.

Next we will show that $B = \span (e^i\otimes e^j)_{i,j=1}^n$. Choose any bilinear map $T:V\times V\to\F$ in $B$, and define

$$T_{ij} = T(e_i, e_j)$$

for all $i$ and $j$. Then for any two vectors $u=\displaystyle\sum_k u^k e_k$ and $v=\displaystyle\sum_l v^l e_l$ in $V$,

$$\begin{align}
T(u,v) &= T\left(\sum_i u^i e_i, \sum_j v^j e_j\right) \\
&= \sum_i\sum_j u^i v^j T(e_i, e_j) \\
&= \sum_i\sum_j T_{ij} u^i v^j \\
&= \sum_i\sum_j T_{ij} \sum_k u^k \delta^i_k \sum_l v^l \delta^j_l \\
&= \sum_i\sum_j T_{ij} \sum_k u^k e^i(e_k) \sum_l v^l e^j(e_k) \\
&= \sum_i\sum_j T_{ij} e^i \left(\sum_k u^k e_k\right) e^j \left(\sum_l v^l e_l\right) \\
&= \left(\sum_i\sum_j T_{ij}e^i\otimes e^j\right)\left(\sum_k u^k e_k, \sum_l v^l e_l\right).
\end{align}$$

Since we can write any $T\in B$ as a linear combination of tensor products $e^i\otimes e^j$, it follows that $(e^i\otimes e^j)_{i,j=1}^n$ spans $B$, completing the proof.

So not only is every tensor product bilinear, it also happens that every bilinear map can be written as a linear combination of tensor products! Furthermore, we've exhibited a nice basis for the space of bilinear maps which is inherited from the dual basis. Lastly, the argument above shows that if $n = \dim V$, then the dimension of $B$ is $n^2$.

Since the space of bilinear maps is the same as the space of tensor products of two covectors, we usually write $\dual{V}\otimes\dual{V}$ to denote $B$, and call it a tensor product space.

Now, as ugly as the goop above was, we've still only dealt with the simplest form of tensor product. The following definitions extend what we have done so far.

Definition. Given vector spaces $V_1, V_2, \ldots, V_k, W$ over the same field $\F$, a multilinear map is a function $T:V_1\times V_2\times \cdots \times V_k\to W$ which is linear in each argument.

Definition. Given $j$ vectors $v_1, \ldots, v_j \in V$ and $k$ covectors $s_1, \ldots, s_k \in \dual{V}$, their tensor product is the function

$$\bigotimes_{i=1}^j v_i \bigotimes_{i=1}^k s_i: \prod_{i=1}^j \dual{V} \prod_{i=1}^k V\to\F$$

defined by

$$\left(\bigotimes_{i=1}^j v_i \bigotimes_{i=1}^k s_i\right) (t_1, \ldots, t_j, u_1, \ldots, u_k) = \prod_{i=1}^j v_i(t_i) \prod_{i=1}^k s_i(u_i)$$

for any covectors $t_1, \ldots, t_j$ and any vectors $u_1, \ldots, u_k$.

We say that the rank of this tensor is (j, k).

Just like with bilinear maps, multilinear maps $\prod_{i=1}^j \dual{V} \prod_{i=1}^k V\to\F$ form a vector space $M$ in the obvious way. Furthermore, this vector space is generated by the obvious choice of basis vectors. That is, if $(e_i)_{i=1}^n$ is a basis for $V$ with dual basis $(e^i)_{i=1}^n$ for $\dual{V}$, then

$$\bigotimes_{l=1}^j {e_{i_l}} \bigotimes_{l=1}^k {e^{i_l}}$$

is a basis for $M$. We thus write

$$M = \bigotimes_{i=1}^j V \bigotimes_{i=1}^k \dual{V}$$

and refer to it as a rank (j, k) tensor product space. I won't prove any of this, though, because the proof for a rank $(0, 2)$ tensor was already ugly enough! The proofs aren't difficult, but they are time and space-consuming.

There are three important things to remember:

  1. A rank $(j, k)$ tensor takes $j$ covectors and $k$ vectors and outputs a scalar.
  2. A rank $(j, k)$ tensor product space has dimension $n^{j+k}$, where $n=\dim V$.
  3. The double dual isomorphism allows us to think of a vector as a linear map which takes a covector and outputs a scalar. This is how we interpret $v_i(t_i)$ in the definition above.

I'll end this post here, but we'll be talking more about tensors very shortly!

]]>
<![CDATA[Dual Spaces]]>https://algebrology.github.io/dual-spaces/5d4cec2a2864a0008db11a72Fri, 09 Aug 2019 03:55:00 GMT
  1. Introduction
  2. The Dual Space
  3. The Double Dual Space

Introduction

Since I haven't posted for a while, I decided to break up my rants about homology with some posts on linear (and multilinear) algebra. In this post, we will (as usual) deal only with finite dimensional vector spaces. Since we care only about abstract properties of vector spaces and not about any specific vector space, I will talk generally about a vector space $V$ of dimension $n$ over a field $\F$ for the remainder of this post.

As we discovered previously, every finite dimensional vector space has a basis. That is, there exists a linearly independent collection $(e_1,e_2,\ldots,e_n)$ of vectors in $V$ for which any vector $v$ in $V$ can be expressed as a linear combination of these basis vectors. That is, for any $v\in V$ there exist scalars $(v^i)_{i=1}^n$ in $\F$ for which

$$v=\sum_{i=1}^n v^i e_i.$$

Note that in the expression above, I have moved the index on $v^i$ into the upper position, whereas in previous posts I would have written the same scalars as $v_i$. There is a good reason for this, and it is commonly seen in physics and differential geometry. The reason will become apparent shortly, but for now just realize that using superscripts for index placement is really no different than using subscripts, and it does not represent exponentiation. For instance, $v^2$ represents the second scalar in a list and not $v\cdot v$.

Having a basis for our vector space is nice for two main reasons:

  1. Any vector can be expressed in terms of the basis because the basis vectors span our vector space.
  2. There is no redundancy in this expression because the basis vectors are linearly independent.

Recall also that a linear map $T$ between vector spaces $U$ and $V$ is a function $T:U\to V$ for which

  1. $T(u_1+u_2) = T(u_1) + T(u_2)$ for any $u_1, u_2 \in U$.
  2. $T(au) = aT(u)$ for any $a\in F$ and any $u\in U$.

We learned that linear maps are completely determined by the way that they act on basis vectors. In fact, we can specify the images of the basis vectors and extend by linearity to obtain a linear map on the whole vector space.

Now let's turn everything on its head.

The Dual Space

Let's define the most important concept of this post:

Definition. Given a vector space $V$ over a field $\F$, its dual space, written $V^*$, is the set of all linear maps from $V$ to $\F$.

Of course, we are now talking about $\F$ as a vector space over itself, or else the idea of a linear map would make no sense.

This definition may seem intimidating at first, but it's really not that complicated. An element of the dual space is just a linear function which eats a vector and returns a scalar. Elements of the dual space are often called covectors or linear functionals.

Now, the fact that the dual space literally has the word "space" in its name is hopefully suggestive that it is itself a vector space. I suppose there might technically be multiple ways to turn this set into a vector space, but the canonical way is as follows:

  • The zero vector ${\bf 0}\in V^*$ is the zero map ${\bf 0}:V\to\F$ which maps every vector to the zero element of $\F$. That is, ${\bf 0}(v)=0$ for every $v\in V$.
  • Vector addition is just function addition. That is, if $s$ and $t$ are maps in the dual space, then $s+t$ is another map defined by $(s+t)(v) = s(v) + t(v)$.
  • Scalar multiplication is inherited directly. That is, if $a$ is a scalar in $\F$ and $t$ is a map in the dual space, then $at$ is another map defined by $(at)(v) = a\cdot t(v)$.
  • Additive inverses are given by scalar multiplication by $-1$. That is, if $t$ is a map in the dual space then $-t=(-1)\cdot t$.

It is hopefully evident that all of the above maps are linear, and that with these definitions, the dual space satisfies the axioms of a vector space. I will not check these properties here because it is not difficult or instructive to do so.

Now, the next natural question to ask is how the dual space $V^*$ is related to the original space $V$. The answer is not immediately obvious. There is no canonical mapping which takes any vector and picks out a specific covector which it is related to. That's not to say we can't form a bijection from $V$ to $V^*$, it just wouldn't have much meaning, and there is not an obvious candidate for a favored bijection.

In fact, creating such a bijection would require first choosing a basis for $V$. Mathematicians do not consider this to be a natural correspondence, since it relies on picking some arbitrary basis, so we say there is no natural or canonical correspondence between $V$ and $V^*$.

However, if we try to figure out the dimension of the dual space, the picture begins to become a little clearer. Before we proceed, we'll need the following definition:

Definition. Given $n\in\N$, the Kronecker delta is the function $\delta:\Z_n\times \Z_n\to\R$ defined by

$$\delta^i_j =
\begin{cases}
0 & \text{if } i\ne j, \\
1 & \text{if } i = j.
\end{cases}$$

The weird upper and lower index notation in place of a more traditional notation for function arguments, such as $\delta(i, j)$, does have a purpose which will soon become apparent.

Alright, we're now well armed to make a very bold claim:

Theorem. If $(e_i)_{i=1}^n$ is a basis for a finite dimensional vector space $V$, then $(e^i)_{i=1}^n$ is a basis for $V^*$, where each basis covector $e^i:V\to\F$ is defined on basis vectors by $$e^i(e_j)=\delta^i_j$$ and defined on all of $V$ by extending by linearity.

Aside. Before we try to prove this, let's take a look at what these basis covectors really are. Since $(e_i)_{i=1}^n$ is a basis for $V$, we can write any vector $v\in V$ as $v=\sum_{j=1}^n v^j e_j$. Applying the $i$th basis covector to $v$ and using its linearity, we get

$$\begin{align}
e^i(v) &= e^i\left(\sum_{j=1}^n v^j e_j\right) \\
&= \sum_{j=1}^n v^j e^i(e_j) \\
&= \sum_{j=1}^n v^j \delta^i_j \\
&= v^i.
\end{align}$$

That is, $e^i$ is the linear map which picks out only the $i$th component of $v$ and discards the rest. It is in some sense a projection map onto the $i$th coordinate.

In order to understand why the complicated-looking sum above broke down into such a simple expression, recall the defining property of the Kronecker delta function. For almost every $j$, the Kronecker delta was identically zero and so those $v^j$ terms do not contribute to the sum. The only term for which it wasn't zero was the $i$th term, and so we were left with only $v^i$. This ability of the Kronecker delta function to simplify ugly sums is almost magical, and we will see it over and over again.

Proof. In order to show that $(e^i)_{i=1}^n$ is a basis for $V^*$, we need to show that it is linearly independent and that it spans $V^*$.

We argue first that it is linearly independent. Suppose $\sum_{i=1}^n a_i e^i=0$ for some scalars $(a_i)_{i=1}^n$. Then for any vector $v=\sum_{j=1}^n v^j e_j$ in $V$,

\begin{align}
\left(\sum_{i=1}^n a_i e^i\right)(v) &= \left(\sum_{i=1}^n a_i e^i\right)\left(\sum_{j=1}^n v^j e_j\right) \\
&= \sum_{i=1}^n\sum_{i=j}^n a_i v^j e^i(e_j) \\
&= \sum_{i=1}^n\sum_{i=j}^n a_i v^j \delta^i_j \\
&= \sum_{i=1}^n a_i v^i \\
&= 0.
\end{align}

Since the $v^i$ were arbitrary, the only way this can only be true is if the scalars $(a_i)_{i=1}^n$ are identically zero. Thus, $(e^i)_{i=1}^n$ are linearly independent.

We argue next that the covectors $(e^i)_{i=1}^n$ span the dual space. To this end, suppose $s$ is any covector in $V^*$. For any vector $v=\sum_{j=1}^n v^j e_j$, we have from the linearity of $s$ that

\begin{align}
s(v) &= s\left(\sum_{j=1}^n v^j e_j\right) \\
&= \sum_{j=1}^n v^j s(e_j).
\end{align}

Define $s_j = s(e_j)$ for each $1\le j \le n$. Then

\begin{align}
\left(\sum_{j=1}^n s_j e^j\right)(v) &= \left(\sum_{j=1}^n s_j e^j\right)\left(\sum_{i=1}^n v^i e_i\right) \\
&= \sum_{j=1}^n\sum_{i=1}^n s_j v^i e^j(e_i) \\
&= \sum_{j=1}^n\sum_{i=1}^n s_j v^i \delta^j_i \\
&= \sum_{j=1}^n s_j v^j \\
&= \sum_{j=1}^n v^j s(e_j) \\
&= s(v).
\end{align}

We've thus shown that any covector can be written as a linear combination of the covectors $(e^i)_{i=1}^n$, and thus they span the dual space.

It follows that $(e^i)_{i=1}^n$ forms a basis for $V^*$, as desired.

We call the basis defined in the proof above the dual basis, and it is the basis we usually work with when talking about the dual space. Note that the dual basis depends on the basis we chose for the original vector space, so every basis for $V$ has a corresponding dual basis for $\dual{V}$. However, it is of course not the only basis we could choose. It is just particularly convenient for our purposes because of the way things tend to simplify via the Kronecker delta.

Hidden in the result above is the fact that $\dim V^*=\dim V$. That's because we've exhibited a basis for $V^*$ consisting of $n$ covectors, which is the definition of the dimension of a vector space. So the dual space always has the same dimension as the original vector space (as long as its finite dimensional)! Which is pretty cool I guess. 😎

The following result will be useful to us later.

Lemma. Suppose $V$ is a finite dimensional vector space over a field $\F$ and let $v\in V$. If $s(v) = 0$ for every covector $s\in\dual{V}$, then $v = 0$.

Proof. We proceed by contraposition, supposing that $v\ne 0$ and arguing that there exists a covector $s\in\dual{V}$ for which $s(v)\ne 0$.

Choose any basis $(e_i)_{i=1}^n$ for $V$. Then we may write $v=\sum_{i=1}^n v^i e_i$ for some scalars $v^1, v^2, \ldots, v^n\in\F$. Since $v\ne 0$, one of its components (with respect to our chosen basis) must be nonzero. That is, there exists $k\in \{1, 2, \ldots, n\}$ for which $v^k\ne 0$.

We choose $s=e^k$, the $k$th dual basis vector. Notice that, by linearity of $s$,

\begin{align}
s(v) &= e^k(v) \\
&= e^k\left( \sum_{i=1}^n v^i e_i \right) \\
&= \sum_{i=1}^n v^i e^k(e_i) \\
&= \sum_{i=1}^n v^i \delta^k_i \\
&= v^k \\[.5em]
&\ne 0.
\end{align}

Thus, we have demonstrated there exists a covector $s$ for which $s(v)\ne 0$, and the result follows by contraposition.

The Double Dual Space

The following definition is exactly as you might expect.

Definition. Given a vector space $V$ over a field $\F$ and its dual space $\dual{V}$, its double dual space, written $\ddual{V}$, is the dual space of $\dual{V}$.

By now, things may seem hopelessly abstract. If the dual space was the space of all linear functions from $V$ to $\F$, that would make the double dual the space of all linear functions from the space of linear functions from $V$ to $\F$ into $\F$. As if that wasn't complicated enough, there's no end in sight. Am I ever going to stop? Or am I going to next construct the triple dual space, the quadruple dual space, ad infinitum?

It turns out we don't need to keep going, because as we will soon see, $\ddual{V}$ is essentially just $V$. We will need the above lemma to prove it.

Theorem. Every finite dimensional vector space $V$ over a field $\F$ is canonically isomorphic to its double dual space $\ddual{V}$.

Proof. Recall first that canonically isomorphic means we can find a isomorphism which does not depend on a choice of basis. So proving the "canonical" part consists only of not choosing a basis to arrive at the result.

Since the dual space of any finite dimensional vector space shares its dimension, it follows that

$$\dim \ddual{V} = \dim \dual{V} = \dim V.$$

Thus, the rank-nullity theorem tells us that a linear map $T:V\to \ddual{V}$ is an isomorphism if it is injective, which greatly simplifies the proof.

Define a map $T:V\to \ddual{V}$ by

$$\big(T(v)\big)(s) = s(v)$$

for any vector $v\in V$ and any covector $s\in \dual{V}$.

Let's pause to make some sense of this. Since $T$ takes $V$ into its double dual, the image $T(v)$ of any vector $v\in V$ will be a linear map in $\ddual{V}$, which itself takes a covector $s\in\dual{V}$. That's why we're defining $T(v)$ by how it acts on a covector $s$.

We will argue that $T$ is an isomorphism. Let's show first that $T$ is a linear map. To this end, suppose $v, v_1, v_2 \in V$ and $a \in \F$. Then because of the linearity of $s$,

$$\begin{align}
\big(T(v_1 + v_2)\big)(s) &= s(v_1 + v_2) \\
&= s(v_1) + s(v_2) \\
&= \big(T(v_1)\big)(s) + \big(T(v_2)\big)(s),
\end{align}$$

and

$$\begin{align}
\big(T(av)\big)(s) &= s(av) \\
&= as(v) \\
&= a\big(T(v)\big)(s).
\end{align}$$

We'll show next that $T$ is injective (and thus bijective by our earlier dimension argument). We will do so by showing that its kernel is trivial. So suppose $v\in\ker T$. Then by definition,

$$\begin{align}
\big(T(v)\big)(s) &= s(v) \\
&= 0
\end{align}$$

for all covectors $s\in\dual{V}$. It follows then from the above lemma that $v=0$, since the above holds for any choice of covector $s$. Therefore, $\ker T = \{0\}$ and so $T$ is injective (and thus bijective).

We have shown that $T$ is a bijective linear map, and we have done so without explicitly choosing a basis for any of the vector spaces involved, so it follows that $T$ is a canonical isomorphism, as desired.

So it turns out that $V$ and $\ddual{V}$ can be used almost interchangeably. But using the double dual space gives us a nice kind of duality (pardon the pun), in that we can think of covectors as maps which act on vectors, and we can think of vectors as maps which act on covectors. Physicists often do this without realizing that they are technically working with cocovectors instead of vectors, but that's fine because the isomorphism makes it work.

I'll leave this here for now. Next time I'll talk about multilinear maps and tensor products!

]]>
<![CDATA[Simplicial Homology]]>https://algebrology.github.io/simplicial-homology/5cb3f9f47f36fe003eea102bMon, 15 Apr 2019 04:04:40 GMT

Homology groups are topological invariants which, informally, give information about the types of holes in a topological space. They are not the only such invariant in algebraic topology, but they are particularly nice to work with since they are always abelian and easy to compute.

For now, we will restrict our discussion to homology of simplicial complexes. Since many topological spaces of interest can be triangularized into a simplicial complex, this is robust enough for a first treatment of homology. I will talk about the more general theory of singular homology in a later post.

I actually gave the definition of simplicial homology groups in my last post, but it was not motivated by anything in particular. Because the motivation for the construction of homology groups can be difficult to understand, we'll work through the concept by looking at a rather in-depth example. As we will see, the natural and intuitive steps we take in this example generalize to a broader theory of homology.

Let's work with the same simplicial complex we looked at in my last post:

simplicial-complex-4

We'll call the complex $X$, to match our notation from last time. The $v_i$ are $0$-simplices, the $e_i$ are $1$-simplices and $f$ is a $2$-simplex. We will be working with the chain groups of $X$ as in the previous post. We define $C_0(X)$ as the free abelian group generated by the $v_i$, $C_1(X)$ as the free abelian group generated by the $e_i$, and $C_2(X)$ as the free abelian group generated by $f$.

Notice that elements of $C_0(X)$ are integral linear combinations of the vertices $v_i$. A typical element $\sigma_0 \in C_0(X)$ can be expressed as

$$\sigma_0 = \sum_{i=0}^3 a_iv_i = a_0v_0 + a_1v_1 + a_2v_2 + a_3v_3,$$

where the $a_i$ are integers. We refer to the elements of $C_0(X)$ as $0$-chains.

Similarly, the elements of $C_1(X)$, which we call $1$-chains, are integral linear combinations of the edges $e_i$. A typical element $\sigma_1 \in C_1(X)$ can be written

$$\sigma_1 = \sum_{i=0}^4 b_ie_i = b_0e_0 + b_1e_1 + b_2e_2 + b_3e_3 + b_4e_4,$$

where the $b_i$ are integers. Since the edges are oriented, remember that we think of the negative of an edge as that same edge oriented in the opposite direction. There are only two possible orientations for each edge, so this works out nicely.

Lastly, the elements of $C_2(X)$, the $2$-chains, are integral multiples of the face $f$. A typical element $\sigma_2 \in C_2(X)$ is of the form

$$\sigma_2 = cf,$$

where $c$ is an integer. We will assume that $f=[v_0, v_1, v_2]$ is oriented clockwise.

Last time, we defined the boundary maps between chain groups for any simplicial complex. Let's look at the boundaries of the simplices in $X$. These are determined by the homomorphisms

$$C_2(X)\xrightarrow{\partial_2} C_1(X)\xrightarrow{\partial_1}C_0(X)\xrightarrow{\partial_0}0.$$

Recall the definition of the boundary of an $n$-simplex:

$$\partial_n([v_0, v_1,\ldots, v_n])=\sum_{i=0}^n(-1)^i[v_0,\ldots,v_{i-1},v_{i+1},\ldots,v_n].$$

We see immediately that

$$\partial_0([v_i]) = 0$$

for every $1$-simplex $v_i$. This should make intuitive sense, since a vertex shouldn't have any boundary. This means that $\partial_0$ is the zero map, (as it must be since it's codomain is the trivial group).

Next we see that

$$\begin{align}
\partial_1(e_0) &= \partial_1([v_0, v_1]) = v_1 - v_0 \\[.5em]
\partial_1(e_1) &= \partial_1([v_1, v_2]) = v_2 - v_1 \\[.5em]
\partial_1(e_2) &= \partial_1([v_2, v_0]) = v_0 - v_2 \\[.5em]
\partial_1(e_3) &= \partial_1([v_3, v_2]) = v_2 - v_3 \\[.5em]
\partial_1(e_4) &= \partial_1([v_0, v_3]) = v_3 - v_0.
\end{align}$$

Since homomorphisms between free abelian groups are uniquely determined by the way they act on generators, we immediately know how to calculate the boundary $\partial_1$ of any $1$-chain in $X$. If

$$\sigma_1 = \sum_{i=0}^4 b_ie_i$$

then from the properties of homomorphisms we see that

$$\begin{align}
\partial_1(\sigma_1) &= \partial_1\left(\sum_{i=0}^4 b_ie_i\right) \\
&= \sum_{i=0}^4 b_i \partial_1(e_i) \\[1em]
&= b_0(v_1-v_0) + b_1(v_2-v_1) + b_2(v_0-v_2) + b_3(v_2-v_3) + b_4(v_3-v_0) \\[1em]
&= (-b_0+b_2-b_4)v_0 + (b_0-b_1)v_1 + (b_1-b_2+b_3)v_2 + (-b_3+b_4)v_3.
\end{align}$$

Of particular interest to us are cycles in $C_1(X)$. As you might expect, a cycle is a $1$-chain which starts where it began and traverses the edges in a natural order. For instance, we would like

$$\begin{align}
\sigma_1 &= e_0 + e_1 + e_2, \\[1em]
\sigma_2 &= -2e_2 -2e_4 -2e_3
\end{align}$$

to be considered cycles, since they are natural paths around the simplex and they form closed loops. However,

$$\begin{align}
\sigma_3 &= e_0 + e_1, \\[1em]
\sigma_4 &= 3e_1 -e_4
\end{align}$$

should not be, as $\sigma_3$ does not form a closed loop and $\sigma_4$ is not even a natural path.

How should we algebraically determine which chains are cycles? To answer this question, let's take a look at the boundaries of the $1$-chains we just discussed. Remember that chain groups are abelian, so we can rearrange terms. This is extremely important!

$$\begin{align}
\partial_1(\sigma_1) &= \partial(e_0 + e_1 + e_2) \\[.5em]
&= \partial(e_0) + \partial(e_1) + \partial(e_2) \\[.5em]
&= (v_1-v_0) + (v_2-v_1) + (v_0-v_2) \\[.5em]
&= (v_0 - v_0) + (v_1 - v_1) + (v_2 - v_2) \\[.5em]
&= 0, \\[2em]
\partial_1(\sigma_2) &= \partial(-2e_2 -2e_4 -2e_3) \\[.5em]
&= -2\partial(e_2) -2\partial(e_4) + -2\partial(e_3) \\[.5em]
&= -2(v_0-v_2) -2(v_3-v_0) -2(v_2-v_3) \\[.5em]
&= -2(v_0 - v_0) -2(v_2 - v_2) -2(v_3 - v_3) \\[.5em]
&= 0, \\[2em]
\partial_1(\sigma_3) &= \partial(e_0 + e_1) \\[.5em]
&= \partial(e_0) + \partial(e_1)\\[.5em]
&= (v_1-v_0) + (v_2-v_1)\\[.5em]
&= -v_0 + (v_1-v_1) + v_2 \\[.5em]
&= -v_0 + v_2, \\[2em]
\partial_1(\sigma_4) &= \partial(3e_1 -e_4) \\[.5em]
&= 3\partial(e_1) -\partial(e_4) \\[.5em]
&= 3(v_2-v_1) -(v_3-v_0) \\[.5em]
&= -v_0 -3v_1 + 3v_2 -v_3. \\[.5em]
\end{align}$$

So It looks like the $1$-chains that ought to be cycles have a boundary of $0$, whereas the others have nonzero boundary. This should make some sense, since remember that the boundary of a $1$-simplex is its second vertex minus its first vertex. So if the edges of a chain follow a natural path, i.e., the chain is of the form

$$\sigma = [v_0, v_1] + [v_1, v_2] + [v_2, v_3] + \cdots + [v_{n-2}, v_{n-1}] + [v_{n-1}, v_n]$$

then the terms in its boundary will telescope and only the extreme points will remain. That is,

$$\begin{align}
\partial_1(\sigma) &= \partial_1([v_0, v_1] + [v_1, v_2] + [v_2, v_3] + \cdots +[v_{n-2}, v_{n-1}] + [v_{n-1}, v_n]) \\[.5em]
&= \partial([v_0, v_1]) + \partial([v_1, v_2]) + \partial([v_2, v_3]) + \cdots + \partial([v_{n-2}, v_{n-1}]) + \partial([v_{n-1}, v_n]) \\[.5em]
&= (v_1-v_0) + (v_2-v_1) + (v_3-v_2) + \cdots + (v_{n-1}-v_{n-2}) + (v_n-v_{n-1}) \\[.5em]
&= -v_0 + (v_1-v_1) + (v_2-v_2) + \cdots + (v_{n-1}-v_{n-1}) + v_n\\[.5em]
&= -v_0 + v_n.
\end{align}$$

If it happens that the chain is also closed, then its first and last vertices are the same. That is, $v_0=v_n$, and so $\partial_1(\sigma)=0$. That means that any cycle is in the kernel of the boundary map! It thus makes sense to make the following definition:

Definition. If $X$ is a simplicial complex with chain groups and boundary operators

$$\cdots\xrightarrow{\partial_{n+1}} C_n(X)\xrightarrow{\partial_n} C_{n-1}(X)\xrightarrow{\partial_{n-1}}\cdots\xrightarrow{\partial_2}C_1(X)\xrightarrow{\partial_1}C_0(X)\xrightarrow{\partial_0}0,$$

then for each natural number $n$, the group of $n$-cycles in $X$ is defined as

$$Z_n(X)=\ker\partial_n.$$

This subgroup consists of all $n$-chains whose boundary is zero. That is, precisely the chains which we called cycles. Since each cycle group $Z_n(X)$ is the kernel of a homomorphism, it is always a normal subgroup of $C_n(X)$.

Remember that the whole point of this endeavor is to identify the holes in our simplicial complex $X$. Intuitively, just from looking at the diagram of $X$, it seems there should only be one hole: that empty region bounded by the edges $e_2$, $e_4$ and $e_3$.

How do we instinctively know this? Because $e_2+e_3+e_4$ is a a cycle in $Z_1(X)$ but it is not the boundary of any $2$-chain in $X$. Those edges could feasibly form the boundary of a $2$-simplex, but there is simply not a $2$-simplex sitting there. That tells us there's a hole in our complex.

On the other hand, $e_0+e_1+e_2$ is a cycle in $Z_1(X)$ and it is the boundary of the $2$-chain $f$, so there is no hole there.

What I'm saying is that holes in $X$ correspond to cycles which are not also the boundary of anything in $X$.

How do we know which chains are also boundaries? Well that's easy! Any boundary of a simplex in $X$ is the image of that simplex under the boundary map.

Definition. If $X$ is a simplicial complex with chain groups and boundary operators

$$\cdots\xrightarrow{\partial_{n+1}} C_n(X)\xrightarrow{\partial_n} C_{n-1}(X)\xrightarrow{\partial_{n-1}}\cdots\xrightarrow{\partial_2}C_1(X)\xrightarrow{\partial_1}C_0(X)\xrightarrow{\partial_0}0,$$

then for each natural number $n$, the group of $n$-boundaries in $X$ is defined as

$$B_n(X)=\im\partial_{n+1}.$$

This subgroup consists of all $n$-chains which are the image of an $(n+1)$-chain under the $(n+1)$th boundary map. That is, $B_n(X)$ consists of all boundaries of the $(n+1)$-chains in $X$.

We showed last time that the boundary of a boundary is always zero, i.e., $(\partial_{n}\circ \partial_{n+1})(\sigma)=0$ for all $n$. We also showed that this implies that $\im\partial_{n+1}\subseteq\ker\partial_n$. Or, in the terminology of this post, $B_n(X)\subseteq Z_n(X)$.

This should make intuitive sense, since every boundary of a simplex should be a cycle but not all cycles are necessarily boundaries. In fact, it is precisely the cycles that are not boundaries that we are interested in. Since each chain group is free abelian, this implies that $B_n(X)$ is a normal subgroup of $Z_n(X)$. We're finally ready to define homology groups!

Definition. If $X$ is a simplicial complex with chain groups and boundary operators

$$\cdots\xrightarrow{\partial_{n+1}} C_n(X)\xrightarrow{\partial_n} C_{n-1}(X)\xrightarrow{\partial_{n-1}}\cdots\xrightarrow{\partial_2}C_1(X)\xrightarrow{\partial_1}C_0(X)\xrightarrow{\partial_0}0,$$

then for each natural number $n$, the $n$th simplicial homology group of $X$ is defined as the quotient group

$$H_n(X)=\frac{Z_n(X)}{B_n(X)} = \frac{\ker\partial_n}{\im\partial_{n+1}}.$$

In the case of our example, $Z_1(X)$ is the free abelian group generated by the two basic cycles in $X$. That is, any cycle in $Z_1(X)$ is an integral combination of $e_0+e_1+e_2$ and $e_2+e_4+e_3$. Since it is generated by two elements, $\rank Z_1(X)=2$.

On the other hand, $B_1(X)$ is the free abelian group generated by boundaries of the $2$-simplices in $X$. But there is only one $2$-simplex, namely $f$. The boundary of $f$ is the chain $e_0+e_1+e_2$, so any boundary in $B_1(X)$ is an integral multiple of this chain. Since it is generated by a single element, $\rank B_1(X)=1$.

This implies that $\rank H_1(X)=2-1=1$. This coincides with our intuitive idea that there is one hole in the complex $X$. In general, the rank of the $n$-th homology group tells us how many $n$-dimensional holes are in a simplicial complex.

The elements of the $1$st homology group $H_1(X)$ are the cosets of $B_1(X)$. That is, any element $\sigma\in H_1(X)$ is of the form

$$\sigma = B_1(X) + \gamma$$

where $\gamma$ is any $1$-cycle in $Z_1(X)$. If $\gamma$ is a multiple of the chain $e_0+e_1+e_2$ then $\gamma\in B_1(X)$ and so we have the identity coset. Otherwise, $\gamma$ contains some multiple of the other cycle, $e_2+e_4+e_3$, and so $B_1(X) + \gamma$ is not the identity. This is why the rank of $H_1(X)$ is $1$ in this case — because the only distinct cosets are the identity and $B_1(X) + e_2+e_4+e_3$.

I'll leave you with this for now. Next time I'll discuss an algorithm for actually computing homology groups of simplicial complexes!

]]>
<![CDATA[Simplicial Complexes and Boundary Maps]]>https://algebrology.github.io/simplicial-complexes-and-boundary-maps/5c96fb2d7f36fe003eea0f77Sun, 14 Apr 2019 03:48:00 GMT
  1. Simplicial Complexes
  2. Chain Groups
  3. Boundary Maps

Simplicial Complexes

Let's define the types of topological spaces that are of interest to us in this post. The idea behind our definitions is that lots of topological spaces can be "triangularized" in such a way that they look sort of like a bunch of "triangles" glued together.

Definition. For any $n\in\N$, the standard $n$-simplex is the subset of $\R^{n+1}$ given by

$$\Delta^n =\bset{(x_1,x_2,\ldots,x_{n+1})\in\R^{n+1}\mid x_i\ge 0 \text{ for all } i \text{ and } \sum_{i=1}^{n+1} x_i = 1}.$$

That is, $\Delta^n$ is the set of points in $\R^{n+1}$ whose components are all positive and sum to $1$. Clearly in $\R$ the only such point is $1$ itself, and so

$$\Delta^0=\set{1}.$$

In $\R^2$ the points which satisfy these conditions are precisely the points which lie on the line segment connecting the point $(1,0)$ to the point $(0,1)$, including these two points. For instance, the points $\left(\frac{1}{3}, \frac{2}{3}\right)$ and $\left(\frac{1}{2}, \frac{1}{2}\right)$ certainly satisfy the requirements and sit on this line segment. It follows that $\Delta^1$ is a line segment in $\R^2$.

delta_1-1

The simplex $\Delta^2$ is a triangle in $\R^3$ which contains its interior and is bounded by the line segments connecting the points $(1,0,0)$, $(0,1,0)$ and $(0,0,1)$.

delta_2

$\Delta^3$ is a solid tetrahedron in $\R^4$ (although we can embed it in $\R^3$ to visualize it). By solid, I mean that it contains its entire interior.

delta_3

Higher-dimensional simplices cannot be visualized so easily. However, they are still relatively easy objects to manipulate formally. And they follow a very interesting pattern:

  • The $1$-simplex is a line with two $0$-simplices at either end.
  • The $2$-simplex is a triangle whose edges are three $1$-simplices and whose vertices are three $0$-simplices.
  • The $3$-simplex is a tetrahedron whose faces are four $2$-simplices, whose edges are six $1$-simplices, and whose vertices are four $0$-simplices.

Recall Pascal's triangle:

pascal

Each element in the triangle (that isn't a $1$) is the sum of the two values directly above it. Equivalently, we can find the value in the $n$th row and $k$th column using the binomial formula:

$${n \choose k} = \frac{n!}{k!(n-k)!}.$$

Pascal's triangle is intimately related to the number of different dimensional faces of a simplex. The number of $k$-faces of an $n$-simplex is the value in column $(k+1)$ and row $(n+1)$ of the triangle, which you can verify quite easily for the simplices given above. For higher dimensional simplices, you'll just have to take my word for it for now.

So far we've only looked at standard simplices, but we can just as easily talk about any point as a non-standard $0$-simplex, any line as a non-standard $1$-simplex, any triangle as a non-standard $2$-simplex, any tetrahedron as a non-standard $3$-simplex, etc. The preferred way to define these arbitrary simplices would be as the images of standard simplices under bijective affine transformations. (An affine transformation is basically just a linear transformation plus a translation, so that it is not necessarily origin-preserving.) However, we will work with them intuitively as follows:

Definition. Given a set of affine-independent vertices $v_0,v_2,\ldots,v_n$ in $\R^{n+1}$, the (oriented) $n$-simplex corresponding to these vertices is the convex hull of the vertices,

$$\bset{a_0v_0 + a_1v_1 + \cdots + a_nv_n\in\R^{n+1} \mid a_i\ge 0 \text{ for all } i \text{ and } \sum_{i=1}^{n+1} a_i = 1},$$

and is denoted by

$$[v_0, v_1, \ldots, v_n].$$

We're now ready to define simplicial complexes. The basic idea is that they are built from simplices which have been glued together in a particularly nice way.

Definition. A simplicial complex $X$ is a finite collection of simplices for which

  1. If $s$ is a simplex in $X$ then every face of $s$ is also in $X$.
  2. If $s_1$ and $s_2$ are simplices in $X$ then either they are disjoint or their intersection $s_1 \cap s_2$ is a face of both $s_1$ and $s_2$.

This definition ensures that the gluing of simplices is done in such a way that all the vertices and faces line up nicely. And of course, we make a simplicial complex $X$ into a topological space by taking the union of all its simplices and considering it as a topological subspace of $\R^{n+1}$, where $n$ is the maximum dimension of any simplex in $X$.

Example. Here's an example of a simplicial complex which we'll be revisiting later. It should technically live in $\R^3$ since $f$ is a $2$-simplex, but this one just happens to be flat enough to embed in $\R^2$.

simplicial-complex-3

Notice that the edges have arrows. This is because they are oriented simplices. That is, $[v_0, v_1]$ is a different simplex than $[v_1, v_0]$, and the arrows are visual indicators of the orientation. Technically $f$ is oriented as well, but I have not added a visual indication of this fact. We will treat $f$ as the simplex $[v_0, v_1, v_2]$.

With all of that in mind, this complex consists of the following simplices:

0-simplices:

$v_0$

$v_1$
$v_2$
$v_3$

1-simplices:

$e_0 = [v_0, v_1]$

$e_1 = [v_1, v_2]$
$e_2 = [v_2, v_0]$
$e_3 = [v_3, v_2]$
$e_4 = [v_0, v_3]$

2-simplices:

$f=[v_0, v_1, v_2]$

Chain Groups

In an earlier post, I developed the machinery of free abelian groups. Now let's finally put that to use.

Definition. Let $X$ denote a simplicial complex. For each $n\in\N$, the group of simplicial $n$-chains, written $C_n(X)$, is the free abelian group with basis all $n$-simplices in $X$. The elements of $C_n(X)$ are called simplicial $n$-chains in $X$.

Stated more plainly, an $n$-chain in a simplicial complex $X$ is a formal integral combination of $n$-simplices in $X$. For example, in the simplicial complex pictured above, we may talk about $1$-chains such as

$$3e_0 + 2e_2-e_3.$$

Since our simplices are oriented, it makes sense to define the negative of a simplex to be that simplex with its verticed traversed in the opposite order. That is,

$$-[v_0,v_1,\ldots,v_n] = [v_n,\ldots, v_1, v_0].$$

We may thus write the chain groups for our simplicial complex as follows:

$$\begin{align}
C_0 &= \bset{\sum_{i=0}^4 a_iv_i \mid a_i\in\Z}, \\[1em]
C_1 &= \bset{\sum_{i=0}^3 b_ie_i \mid b_i\in\Z}, \\[1em]
C_2 &= \bset{cf \mid c\in\Z}, \\[1em]
C_n &= \set{0} \text{ for } n\ge 3.
\end{align}$$

It is natural to associate chains in $C_1$ to actual paths in the simplex $X$. For instance, the chain $\sigma = e_0 + e_1 + e_2$ traces out a path all the way around $f$ from $v_0$ to $v_1$ to $v_2$ and back to $v_0$.

Its negative, $-\sigma = -e_2-e_1-e_0$ traces out the same path but in the opposite direction.

Its double, $2\sigma = 2e_0 + 2e_1 + 2e_2$ traces out the same path twice in a row. That is, it loops around $f$ two times instead of just once.

It might seem natural to view the chain $\sigma$ as the boundary of the simplex $f$. Let's work toward making this idea more formal.

Boundary Maps

First lets define the boundary of a $1$-simplex, since it is the easiest to visualize. Since a $1$-simplex is essentially just a closed line segment, its boundary should consist solely of its endpoints, which are $0$-simplices. However, the directedness of the edges indicates that we should add/subtract the endpoints rather than take their union.

Thus, we define the boundary of the $1$-simplex $[v_0, v_1]$ to be the $0$-chain $v_1-v_0$.

Similarly, we would like the boundary of the $2$-simplex $[v_0, v_1, v_2]$ to be the sum of its edges $[v_0, v_1]$, $[v_1, v_2]$ and $[v_2, v_0]$. So we define the boundary of $[v_0, v_1, v_2]$ as

$$[v_0, v_1] + [v_1, v_2] + [v_2, v_0] = [v_1, v_2] - [v_0, v_2] + [v_0, v_1].$$

Note that the expressions on both sides are completely equal, just with the order permuted and $[v_2, v_0]$ negated twice. The reason for this rearrangement will become apparent shortly.

It is very important to note that the boundary of an $n$-simplex is always an $(n-1)$-chain!

In general, we will define boundary homomorphisms not just on simplices but between chain groups. First we will describe how they act on generators and then we will extend by linearity to a homomorphism on the chain groups.

Let $X$ be a simplicial complex and let $B_n(X)$ denote the set of $n$-simplices in $X$. That is, $B_n(X)$ is a basis for the free abelian group $C_n(X)$, the group of $n$-chains in $X$. Define a function $\partial'_n:B_n(X)\to C_{n-1}(X)$ as follows: For any oriented $n$-simplex $\sigma=[v_0, v_1,\ldots, v_n]$, we let

$$\partial'_n(\sigma) = \partial'_n([v_0, v_1,\ldots, v_n])=\sum_{i=0}^n(-1)^i[v_0,\ldots,v_{i-1},v_{i+1},\ldots,v_n].$$

That is, an alternating sum of $n-1$-simplices where the $i$th term is missing the $i$th vertex.

Definition. The boundary map of dimension $n$ is the unique homomorphism $\partial_n:C_n(X)\to C_{n-1}(X)$ given by extending the function $\partial'_n$ by linearity.

Notice that these boundary maps form a sequence

$$\cdots\xrightarrow{\partial_{n+1}} C_n(X)\xrightarrow{\partial_n} C_{n-1}(X)\xrightarrow{\partial_{n-1}}\cdots\xrightarrow{\partial_2}C_1(X)\xrightarrow{\partial_1}C_0(X)\xrightarrow{\partial_0}0.$$

Of course, $\partial_0$ is the zero map, since $0$ is the trivial group.

It is easy to see that these boundary maps are all well defined and that $\partial_n(-\sigma)=-\partial_n(\sigma)$ for any $n$-simplex $\sigma$. Thus, boundary maps are not affected by the orientation of simplices in a chain, as long as the orientations are consistent.

Next, we will prove an extremely important and useful result concerning the structure of chain groups and boundary maps — namely, the boundary of a boundary is always zero.

Theorem. Let $X$ be a simplicial complex with $\sigma\in X$ an $n$-simplex. Then $(\partial_{n}\circ \partial_{n+1})(\sigma)=0$.

Proof. The proof is a direct computation.

\begin{align*}
\big(\partial_{n-1}\circ\partial_n\big)\left([v_0,\dotsc,v_n]\right)&=\sum_{i=0}^n(-1)^i\partial_{n-1}[v_0,\dotsc,v_{i-1},v_{i+1},\dotsc,v_n]\\
&= \sum_{j<i}(-1)^i(-1)^j[v_0,\dotsc,v_{j-1},v_{j+1},\dotsc,v_{i-1},v_{i+1},\dotsc,v_n]\\
&\phantom{==} +\sum_{j>i}(-1)^i(-1)^{j-1}[v_0,\dotsc,v_{i-1},v_{i+1},\dotsc,v_{j-1},v_{j+1},\dotsc,v_n]\\
&=0
\end{align*} because the terms in the last two summations cancel in pairs.

Recall that the kernel and image of any group homomorphism are subgroups of the domain and codomain, respectively. The result above actually implies that the image of any boundary map is a subgroup of the kernel of the next boundary map!

Theorem. Let $X$ be a simplicial complex. Then $\im\partial_{n+1}\subseteq\ker\partial_n$ for any natural number $n$.

Proof. Choose any $\beta\in\im\partial_{n+1}$. Then $\beta=\partial_{n+1}(\sigma)$ for some chain $\sigma\in C_{n+1}$. But then

$$\begin{align}
\partial_n(\beta) &= \partial_n(\partial_{n+1}(\sigma)) \\
&=0
\end{align}$$

from the above theorem, and so $\beta\in\ker\partial_n$. It follows that $\im\partial_{n+1}\subseteq\ker\partial_n$.

In addition to the kernel and image being subgroups, because we are working strictly with abelian groups this implies that they are normal subgroups. In particular, since $\im\partial_{n+1}\subseteq\ker\partial_n$, it follows that $\im\partial_{n+1}$ is a normal subgroup of $\ker\partial_n$. This means that we can actually form quotient groups $\frac{{\ker\partial_n}}{\im\partial_{n+1}}$ for every natural number $n$. This will become extremely important when we talk about homology in the next post. In fact, this is the definition of a homology group! But its motivation will become more clear later.

]]>
<![CDATA[Matrices for Linear Maps]]>https://algebrology.github.io/matrices-for-linear-maps/5c933214bdf47d003ee073dfThu, 21 Mar 2019 07:06:11 GMT

If you already knew some linear algebra before reading my posts, you might be wondering where the heck all the matrices are. The goal of this post is to connect the theory of linear maps and vector spaces to the theory of matrices and computation.

The key observation is that linear maps between finite-dimensional vector spaces can be completely described solely in terms of how they act on basis vectors. Let's explore this idea in a more precise setting.

Suppose $T:U\to V$ is a linear map with $\dim U = n$ and $\dim V = m$. Let $(e_1,\ldots,e_n)$ be a basis for $U$ and let $(f_1,\ldots,f_m)$ be a basis for $V$. Then for any vector $u\in U$, we may express $u$ uniquely as a linear combination of basis vectors,

$$u=\sum_{i=1}^n a_i e_i,$$

where $a_i\in\F$ for $1\le i\le n$. Since $T$ is a linear map, this means that

$$\begin{align}
Tu &= T\left(\sum_{i=1}^n a_i e_i\right) \\
&= \sum_{i=1}^n a_i Te_i. \tag{1}
\end{align}$$

We can go further and decompose each $Te_i\in V$ as a linear combination of basis vectors in $V$,

$$Te_i = \sum_{j=1}^m b_{j,i} f_j,$$

where $b_{j,i}\in\F$ for $1\le j\le m$. Thus, we may rewrite equation $(1)$ as

$$Tu = \sum_{i=1}^n \sum_{j=1}^m a_i b_{j,i} f_j.$$

Since the scalars $a_i$ depend only on the input vector $u$ and the basis vectors $f_j$ are fixed, this means that the way $T$ acts on $u$ is determined entirely by the scalars $b_{j,i}$.

To put it another way, say we've chosen bases for $U$ and $V$. Then everything about the linear map is encoded in a collection of $m\times n$ scalars, which describe how the map acts on basis vectors in $U$ in terms of basis vectors in $V$.

If we were, for instance, to write the scalars $b_{j,i}$ in the following form:

$$\begin{bmatrix}
b
\end{bmatrix}_{j,i} = \begin{bmatrix}
b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\
b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{m,1} & b_{m,2} & \cdots & b_{m,n}
\end{bmatrix},$$

then we would have a compact and easy to read way of describing the linear map $T$.

This is what we call an $\mathbf{m\times n}$ matrix. In this case, $[b_{j,i}]$ is the matrix corresponding to the linear transformation $T$ with respect to the bases $(e_1,\ldots,e_n)$ for $U$ and $(f_1,\ldots,f_m)$ for $V$. We sometimes write $\M(T)$ denote the matrix for $T$, assuming we have already established which bases we are using for our domain and codomain.

It can be a bit confusing at first trying to remember what the shape of a linear map's matrix should be. The number of columns corresponds to the dimension of the domain, and the number of rows corresponds to the dimension of the codomain. The $i$th column of the matrix is really just the components of $Te_i$. That is, we may think of the matrix $\M(T)$ as

$$\M(T) = \begin{bmatrix}
\mid & \mid & & \mid \\
Te_1 & Te_2 & \cdots & Te_n \\
\mid & \mid & & \mid
\end{bmatrix}.$$

The scalar $b_{j,i}$ in the $j$th row and $i$th column of $\M(T)$ is thus the coefficient of the $j$th basis vector $f_j$ for $V$ in the representation of $Te_i$.

If $T:\F_1^n\to\F_2^m$ is a linear map between vector spaces $\F_1^n$ and $\F_2^m$, where $\F_1$ and $\F_2$ are fields, then unless otherwise specified it should be assumed that its matrix $\M(T)$ is with respect to the standard bases on $\F_1^n$ and $\F_2^m$. Otherwise, it does not make sense to talk about a matrix for a linear map without first explicitly choosing bases for the vector spaces involved. This is very important.

Example. Recall the zero map we defined last time, $0:U\to V$ where $U(u)=0$ for every $u\in U$. Suppose $(e_1,\ldots,e_n)$ is any basis for $U$ and $(f_1,\ldots,f_m)$ is any basis for $V$.

To compute the matrix for the zero map with respect to these bases, all we need to do is figure out how it acts on basis vectors. Choose and $i$ for which $1\le i\le n$. Then

$$\begin{align}
0(e_i) &= \sum_{j=1}^m b_{j,i} f_j \\
&= 0
\end{align}$$

for some scalars $b_{j,i}\in\F$. But since $(f_1,\ldots,f_m)$ is a basis for $V$, it is linearly independent and so this is only possible if $b_{j,i}=0$ for $1\le j\le m$. That means the matrix for the zero map looks like this:

$$\M(0) = \begin{bmatrix}
0 & 0 & \cdots & 0 \\
0 & 0 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 0
\end{bmatrix}.$$

And it looks like this for any choice of bases for $U$ and $V$, since there's only one way to write the zero vector in terms of any basis.

We call this the zero matrix of dimension $m\times n$.

Example. Recall the identity map we defined last time, $I:V\to V$ where $Iv=v$ for every $v\in V$. Suppose $(e_1,\ldots,e_n)$ is any basis for $V$.

How does $I$ act on basis vectors? Choose any $i$ with $1\le i\le n$. Then

$$\begin{align}
Ie_i &= \sum_{j=1}^n b_{j,i} e_j \\
&= e_i.
\end{align}$$

Since $(e_1,\ldots,e_n)$ is linearly independent, this is only possible if $b_{j,i} = 0$ for every $j\ne i$ and $b_{i,i} = 1$. That is,

$$b_{j,i} = \begin{cases}
0 & \text{if } j\ne i, \\
1 & \text{if } j=i.
\end{cases} \tag{2}$$

This means that the matrix for the identity map is a square matrix which looks like this:

$$\M(I) = \begin{bmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1
\end{bmatrix},$$

with $1$s along the main diagonal and $0$s everywhere else. And it looks like this for any choice of basis for $V$, as long as we use the same basis for both copies of $V$ (the domain and the codomain). Otherwise, all bets are off.

We call this the identity matrix of dimension $n\times n$.

We also give a name to formula $(2)$ for the scalars $b_{j,i}$. The Kronecker delta is a function of two variables defined by

$$\delta_{j,i} = \begin{cases}
0 & \text{if } j\ne i, \\
1 & \text{if } j=i.
\end{cases}$$

We can rewrite the identity map in terms of the Kronecker delta, if we want:

$$\M(I) = \begin{bmatrix}
\delta_{1,1} & \delta_{1,2} & \cdots & \delta_{1,n} \\
\delta_{2,1} & \delta_{2,2} & \cdots & \delta_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
\delta_{n,1} & \delta_{n,2} & \cdots & \delta_{n,n}
\end{bmatrix}.$$

There are a number of important operations we can define on matrices.

Definition. If $\begin{bmatrix} a \end{bmatrix}_{j,i}$ and $\begin{bmatrix} b \end{bmatrix}_{j,i}$ are $m\times n$ matrices, their sum is defined component-wise:

$$\begin{align}
\begin{bmatrix} a \end{bmatrix}_{j,i} + \begin{bmatrix} b \end{bmatrix}_{j,i} &= \begin{bmatrix}
a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n}
\end{bmatrix} + \begin{bmatrix}
b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\
b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{m,1} & b_{m,2} & \cdots & b_{m,n}
\end{bmatrix} \\[1em]
&= \begin{bmatrix}
a_{1,1} + b_{1,1} & a_{1,2} + b_{1,2} & \cdots & a_{1,n} + b_{1,n} \\
a_{2,1} + b_{2,1} & a_{2,2} + b_{2,2} & \cdots & a_{2,n} + b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} + b_{m,1} & a_{m,2} + b_{m,2} & \cdots & a_{m,n} + b_{m,n}
\end{bmatrix}.
\end{align}$$

It is easy to show that with this definition and the normal operation of addition of functions, $M(T_1 + T_2) = M(T_1) + M(T_2)$ for any linear maps $T_1,T_2:U\to V$.

Addition of matrices is not defined for matrices of different sizes, since this would be like trying to add functions with different domains and codomains.

Definition. If $\begin{bmatrix} a \end{bmatrix}_{j,i}$ is an $m\times n$ matrix, its scalar product with $k\in\F$ is defined component-wise:

$$\begin{align}
k\begin{bmatrix} a \end{bmatrix}_{j,i} &= k\begin{bmatrix}
a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n}
\end{bmatrix} \\[1em]
&= \phantom{k}\begin{bmatrix}
ka_{1,1} & ka_{1,2} & \cdots & ka_{1,n} \\
ka_{2,1} & ka_{2,2} & \cdots & ka_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
ka_{m,1} & ka_{m,2} & \cdots & ka_{m,n}
\end{bmatrix}.
\end{align}$$

So far we have seen a lot of matrices but I have not shown you what they are useful for. The rest of this post will address their uses.

Suppose we have a linear map $T:U\to V$ and bases $(e_1,\ldots,e_n)$ for $U$ and $(f_1,\ldots,f_m)$ for $V$. Then for any $u\in U$, we may write

$$u = \sum_{i=1}^n u_i e_i,$$

where the scalars $u_i\in\F$ are unique. We can define a new sort of matrix — the matrix of the vector $u$, as follows:

$$\M(u) = \begin{bmatrix}
u_1 \\
u_2 \\
\vdots \\
u_n
\end{bmatrix}.$$

The natural question to ask is this: given the matrix $M(T)$ for the linear map $T$ and the matrix $M(u)$ for the vector $u$, what does the matrix $M(Tu)$ for the vector $Tu\in V$ look like? All we need to do is calculate its components in terms of the basis $(f_1,\ldots,f_m)$,

$$Tu = \sum_{i=1}^n \sum_{j=1}^m u_i b_{j,i} f_j.$$

So the matrix for $Tu$ is

$$\begin{align}
\M(Tu) &= \begin{bmatrix}
(Tu)_1 \\
(Tu)_2 \\
\vdots \\
(Tu)_m
\end{bmatrix} \\[1em]
&= \begin{bmatrix}
\sum_{i=1}^n u_i b_{1,i} \\
\sum_{i=1}^n u_i b_{2,i} \\
\vdots \\
\sum_{i=1}^n u_i b_{m,i}
\end{bmatrix}.
\end{align}$$

Observe that all of the information about $\M(Tu)$ is encoded in the matrices $\M(T)$ and $\M(u)$. So we make the following definition:

Definition. If $\begin{bmatrix} b \end{bmatrix}_{j,i}$ is an $m\times n$ matrix for a linear map and $\begin{bmatrix} u \end{bmatrix}_{i}$ is an $n\times 1$ matrix for a vector, we define matrix-vector multiplication as follows:

$$\begin{align}
\begin{bmatrix} b \end{bmatrix}_{j,i} \begin{bmatrix} u \end{bmatrix}_{i} &= \begin{bmatrix}
b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\
b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{m,1} & b_{m,2} & \cdots & b_{m,n}
\end{bmatrix} \begin{bmatrix}
u_1 \\
u_2 \\
\vdots \\
u_n
\end{bmatrix} \\[1em]
&= \begin{bmatrix}
\sum_{i=1}^n u_i b_{1,i} \\
\sum_{i=1}^n u_i b_{2,i} \\
\vdots \\
\sum_{i=1}^n u_i b_{m,i}
\end{bmatrix} \\[1em]
&= \begin{bmatrix} \sum_{i=1}^n u_i b_{j,i} \end{bmatrix}_j.
\end{align}$$

By definition then, we have that $\M(Tu)=\M(T)\M(u)$. This means if we have a matrix representation of a linear map and a matrix representation of a vector, both in terms of some bases, then we work entirely in terms of matrices. Incidentally, this is why it is common to write $Tu$ to mean $T(u)$ — because the action of the linear map $T$ on the vector $u$ can be calculated via multiplication of their matrices.

Now, suppose we have three vector spaces $U$, $V$ and $W$, of dimensions $n$, $m$ and $l$, respectively. Let $T:U\to V$ and $S:V\to W$ be linear maps, and let $(e_1,\ldots,e_n)$ be a basis for $U$, $(f_1,\ldots,f_m)$ be a basis for $V$, and $(g_1,\ldots,g_l)$ be a basis for $W$. We have already seen that if

$$u=\sum_{i=1}^n u_i e_i$$

and

$$Te_i = \sum_{i=1}^m b_{j,i}f_j$$

then

$$Tu = \sum_{i=1}^n \sum_{i=1}^m u_i b_{j,i} f_j.$$

Let's take this one step further, and compute $S(Tu)$. Suppose $S$ acts on each basis vector $f_j$ as follows:

$$Sf_j = \sum_{k=1}^l c_{k,j}g_k.$$

Then

$$\begin{align}
S(Tu) &= S\left(\sum_{i=1}^n \sum_{i=1}^m u_i b_{j,i} f_j\right) \\
&= \sum_{i=1}^n \sum_{i=1}^m u_i b_{j,i} Sf_j \\
&= \sum_{i=1}^n \sum_{i=1}^m \sum_{k=1}^l u_i b_{j,i} c_{k,j} g_k.
\end{align}$$

We thus make the following definition:

Definition. If $\begin{bmatrix} b \end{bmatrix}_{j,i}$ is an $m\times n$ matrix and $\begin{bmatrix} c \end{bmatrix}_{k,j}$ is an $l\times m$ matrix, we defined matrix-matrix multiplication as follows:

$$\begin{align}
\begin{bmatrix} c \end{bmatrix}_{k,j} \begin{bmatrix} b \end{bmatrix}_{j,i} &= \begin{bmatrix}
c_{1,1} & c_{1,2} & \cdots & c_{1,m} \\
c_{2,1} & c_{2,2} & \cdots & c_{2,m} \\
\vdots & \vdots & \ddots & \vdots \\
c_{l,1} & c_{l,2} & \cdots & c_{l,m}
\end{bmatrix}\begin{bmatrix}
b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\
b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{m,1} & b_{m,2} & \cdots & b_{m,n}
\end{bmatrix} \\[1em]
&= \begin{bmatrix}
\sum_{j=1}^m b_{j,1}c_{1,j} & \sum_{j=1}^m b_{j,2}c_{1,j} & \cdots & \sum_{j=1}^m b_{j,n}c_{1,j} \\
\sum_{j=1}^m b_{j,1}c_{2,j} & \sum_{j=1}^m b_{j,2}c_{2,j} & \cdots & \sum_{j=1}^m b_{j,n}c_{2,j} \\
\vdots & \vdots & \ddots & \vdots \\
\sum_{j=1}^m b_{j,1}c_{l,j} & \sum_{j=1}^m b_{j,2}c_{l,j} & \cdots & \sum_{j=1}^m b_{j,n}c_{l,j}
\end{bmatrix} \\[1em]
&= \begin{bmatrix} \sum_{j=1}^m b_{j,i} c_{k,j} \end{bmatrix}_{k,i}.
\end{align}$$

Note that the product is an $l\times n$ matrix.

By definition then, we have that $\M(S)\M(T)=\M(S\circ T)$. Which is super awesome! Matrix multiplication corresponds to function composition of linear maps. And of course the dimensions of the product should be $l\times n$, since $S\circ T:U\to W$. We often write $S\circ T$ as simply $ST$, since this is reminiscent of multiplication.

I'm going to end here for now because all of these indices are making my head spin. But we'll see matrices again all over the place.

]]>
<![CDATA[Linear Maps]]>https://algebrology.github.io/linear-maps/5c918ffabdf47d003ee07357Wed, 20 Mar 2019 08:49:00 GMT
  1. Linear Maps
  2. Basic Properties
  3. Rank and Nullity

Sooo somewhere along the line I dropped the arrow notation for vectors. I was using $\v$ instead of $v$ to denote vectors to keep things clean and avoid confusion, but it was inevitable that this would eventually stop happening because it was a lot of work on my end.

Linear Maps

Just as we decided to study continuous functions between topological spaces and homomorphisms between groups, much of linear algebra is dedicated to the study of linear maps between vector spaces.

If you've been paying close attention, you may have noticed that vector spaces are really just groups to which we have added scalar multiplication. In the same vein, linear maps are really just homomorphisms with an additional property which makes them compatible with scalar multiplication.

Definition. A linear map (or linear transformation) between two vector spaces $U$ and $V$ is a function $T:U\to V$ for which the following properties hold:

Additivity
For any vectors $u_1,u_2\in U$, we have that $T(u_1+u_2) = T(u_1) + T(u_2)$.

Homogeneity
For any vector $u\in U$ and any scalar $a\in\F$, we have that $T(au)=aT(u)$.

It is conventional to write $Tu$ to mean $T(u)$, when no confusion arises from this shorthand. The reason for this notation will become apparent shortly.

It is important to note that on the left side of each equation in this definition, the vector addition and scalar multiplication are being done in $U$. That is, we are applying $T$ to the sum $u_1+u_2\in U$ and to the scalar product $au\in U$. On the right, the addition and multiplication are being done in $V$. That is, we are first applying $T$ to $u_1$ and $u_2$ separately and adding the result in $V$, or applying it to $u$ and multiplying the result in $V$. So linear maps are precisely those functions for which it does not matter whether we add/multiply before or after applying the function.

Example. We define a "zero map" $0:U\to V$ between any two vector spaces by $0(u)=0$ for all vectors $u\in U$. This is obviously a linear map because

$$\begin{align}
0(u_1 + u_2) &= 0 \\
&= 0 + 0 \\
&= 0(u_1) + 0(u_2),
\end{align}$$

which shows that it satisfies additivity, and

$$\begin{align}
0(au) &= 0 \\
&= a0 \\
&= a0(u),
\end{align}$$

which shows that it satisfies homogeneitry.

Note that I am making no distinction between the zero vector in the domain and the zero vector in the codomain. Nonetheless, it is important to realize that they may be different objects. For example, in the case of a linear map from $\R^n$ to $\P(\F)$, the zero vector in $\R^n$ is the $n$-tuple $(0,0,\ldots,0)$ and the zero vector in $\P(\F)$ is the zero polynomial $p(x)=0$.

What's even worse though it that I'm also using the symbol $0$ to represent the zero map. Maybe I just worship chaos.

Example. The identity map $I:U\to V$ between any two vector spaces, defined as usual by $Iu=u$ for all vectors $u\in U$, is a linear map. To see that it is additive, note that

$$\begin{align}
I(u_1+u_2) &= u_1+u_2 \\
&= Iu_1 + Iu_2.
\end{align}$$

To see that it is homogeneous, note that

$$\begin{align}
I(au) &= au \\
&= aIu.
\end{align}$$

The identity map is super important in linear algebra.

Example. The map $T:\R^2\to\R$ defined by $T(x,y)=x+y$ for any vector $(x,y)\in\R^2$ is a linear map.

This map is additive because

$$\begin{align}
T((x_1,y_1)+(x_2,y_2)) &= T(x_1+x_2,y_1+y_2) \\
&= x_1+x_2+y_1+y_2 \\
&= x_1+y_1+x_2+y_2 \\
&= T(x_1,y_1) + T(x_2,y_2).
\end{align}$$

It is homogeneous because

$$\begin{align}
T(a(x,y)) &= T(ax,ay) \\
&= ax+ay \\
&= a(x+y) \\
&= aT(x,y).
\end{align}$$

Example. The map $T:\R^3\to\P_2(\R)$ defined by $T(a,b,c)=a+bx+(a-b)x^2$ is a linear map.

It's additive because

$$\begin{align}
T((a_1,b_1)+(a_2,b_2)) &= T(a_1+a_2,b_1+b_2) \\
&= a_1+a_2 + (b_1+b_2)x + (a_1+a_2 - b_1-b_2)x^2\\
&= a_1 + b_1x + (a_1-b_1)x^2 + a_2 + b_2x + (a_2-b_2)x^2\\
&= T(a_1,b_1) + T(a_2,b_2).
\end{align}$$

And it's homogeneous because

$$\begin{align}
T(k(a,b)) &= T(ka,kb) \\
&= ka + kbx + (ka-kb)x^2 \\
&= k(a+bx+(a-b)x^2) \\
&= kT(a,b).
\end{align}$$

Basic Properties

What follows are some basic properties of linear maps.

The first result says that linear maps take the zero vector in the domain to the zero vector in the codomain.

Theorem. If $T:U\to V$ is a linear map, then $T(0)=0$.

Proof. We proceed directly via the following computation:

$$\begin{align}
T(0) &= T(0) + 0 \\
&= T(0) + T(0) - T(0) \\
&= T(0 + 0) - T(0) \\
&= T(0) - T(0) \\
&= 0.
\end{align}$$

Linear maps do a lot more than that though. They also map subspaces to subspaces!

Theorem. If $T:U\to V$ and $W$ is a subspace of $U$ then $T[W]$ is a subspace of $V$.

Proof. We just need to show that $T[W]$ contains the zero vector and is closed under vector addition and scalar multiplication.

Since $W$ is a subspace, it contains the zero vector in $U$. Thus, since $T(0)=0$, it follows that $T[W]$ contains the zero vector in $V$.

Next, suppose $v_1,v_2\in T[W]$. Then $v_1=Tw_1$ and $v_2=Tw_2$ for some vectors $w_1,w_2\in W$. But since $W$ is a subspace, it contains $w_1+w_2$. Thus,

$$\begin{align}
v_1 + v_2 &= Tw_1 + Tw_2 \\
&= T(w_1 + w_2) \\
&\in T[W].
\end{align}$$

Lastly, suppose $v\in T[W]$ and $a\in\F$. Then $v=Tw$ for some $w\in W$. But since $W$ is a subspace, it contains $aw$. Thus,

$$\begin{align}
av &= aTw \\
&= T(aw) \\
&\in T[W].
\end{align}$$

It follows that $T[W]$ is a subspace of $V$, as desired.

The kernel of any linear transformation is a subspace of the domain.

Theorem. If $T:U\to V$ then $\ker T$ is a subspace of $U$.

Proof. Again, we need to show that $\ker T$ contains the zero vector and is closed under vector addition and scalar multiplication.

Since $T(0)=0$, certainly $0\in\ker T$ by definition.

Next, suppose $u_1,u_2\in\ker T$. Then $Tu_1=0$ and $Tu_2=0$. Thus,

$$\begin{align}
T(u_1 + u_2) &= Tu_1 + Tu_2 \\
&= 0 + 0 \\
&= 0,
\end{align}$$

and so $u_1+u_2\in\ker T$.

Lastly, suppose that $u\in\ker T$ and $a\in\F$. Then $Tu=0$, and therefore

$$\begin{align}
T(au) &= aTu \\
&= a0 \\
&= 0,
\end{align}$$

so $au\in\ker T$. It follows that $\ker T$ is a subspace of $U$.

For some reason in linear algebra it is common to refer to the kernel of a linear transformation as its null space, and to write $\ker T$ as $\null T$. I personally find this annoying since in every other context we refer to fiber of $0$ as the kernel. But you should at least be aware that this convention exists and is prevalent in the literature.

In the same vein, the image of any linear transformation is a subspace of the codomain. Actually we've already done all the work to show this.

Theorem. If $T:U\to V$ then $\im T$ is a subspace of $V$.

Proof. Since linear maps take subspaces to subspaces, and $U$ is a subspace of itself, it follows that $\im T$ is a subspace of $V$.

Rank and Nullity

Now we will explore the interesting relation between the dimensions of the kernel and image of a linear map. It will take some time to build up to the "big result" of this section, but it is well worth the wait.

To understand this connection, we begin by considering what injective and surjective linear maps look like.

Theorem. A linear map $T:U\to V$ is injective if and only if $\ker T = \set{0}$.

Proof. Suppose first that $\ker T=\set{0}$, and suppose further that $Tu_1=Tu_2$. Then

$$\begin{align}
0 &= Tu_1 - Tu_2 \\
&= T(u_1-u_2),
\end{align}$$

and so $u_1-u_2\in\ker T$. But $\ker T$ contains only one element — the zero vector. So $u_1-u_2=0$, meaning $u_1=u_2$. It follows that $T$ is injective.

Suppose conversely that $T$ is injective. We know that $T(0)=0$. Furthermore, since $T$ is injective, at most one vector in the domain can be mapped to any vector in the codomain. This implies that $0$ is the only vector which gets mapped to zero. Hence, $\ker T=\set{0}$.

Injective linear maps have an extremely special property. Namely, they preserve linear independence!

Theorem. Let $T:U\to V$ denote an injective linear map, and suppose $(u_1,\ldots,u_n)$ is a linearly independent list of vectors in $U$. Then $(Tu_1,\ldots,Tu_n)$ is a linearly independent list of vectors in $V$.

Proof. Suppose we have scalars $a_i\in\F$ for $1\le i\le n$ such that

$$\begin{align}
\sum_{i=1}a_iTu_i &= T\left(\sum_{i=1}^n a_i u_i\right) \\
&= 0.
\end{align}$$

Since $T$ is injective, we know that $\ker T=\set{0}$ and therefore

$$\sum_{i=1}^n a_i u_i = 0.$$

Since $(u_1,\ldots,u_n)$ is linearly independent, this implies $a_i=0$ for $1\le i\le n$. Thus, $(Tu_1,\ldots,Tu_n)$ is linearly independent.

The next theorem is probably pretty obvious. It just says that if the domain of a linear map is finite-dimensional then so is its image.

Theorem. Let $T:U\to V$ denote a linear map. If $U$ is finite-dimensional, then $\im T$ is a finite-dimensional subspace of $V$.

Proof. We have already shown that $\im T$ is a subspace of $V$. To show that it is finite-dimensional, we must demonstrate that it is spanned by a finite list of vectors in $V$. Let $(e_1,\ldots,e_n)$ be any basis for $U$. We claim that

$$\im T = \span(Te_1,\ldots,Te_n).$$

To see this, choose any vector $v\in\im T$. Then $v=Tu$ for some $u\in U$. We may write $u$ as a linear combination of basis vectors

$$u = \sum_{i=1}^n a_i e_i,$$

where $a_i\in\F$ for $i\le i\le n$. Thus,

$$\begin{align}
v &= Tu \\
&= T\left(\sum_{i=1}^n a_i e_i\right) \\
&= \sum_{i=1}^n a_i Te_i.
\end{align}$$

We have expressed $v$ as a linear combination of the vectors $Te_1,\ldots,Te_n$. It follows that

$$\im T = \span(Te_1,\ldots,Te_n).$$

Since $\im T$ is spanned by a finite list of vectors, it is finite-dimensional.

We now come to the main theorem of this post, which states that the dimension of the domain of a linear map is always equal to the sum of the dimensions of its kernel and image.

Rank-Nullity Theorem. Suppose $U$ is a finite-dimensional vector space, $V$ is any vector space, and $T:U\to V$ is a linear map. Then

$$\dim U = \dim\ker T + \dim\im T.$$

Proof. Let $n=\dim\ker T$, and choose any basis $(e_1,\ldots,e_n)$ for $\ker T$. Then we can extend this list to a basis $(e_1,\ldots,e_n,f_1,\ldots,f_m)$ for $U$ by adding $m=\dim U - n$ linearly independent vectors. We argue that $\dim\im T=m$ by showing that $(Tf_1,\ldots,Tf_m)$ is a basis for $\im T$.

Certainly $\im T = \span(Tf_1,\ldots,Tf_m)$ since $\span(Te_1,\ldots,Te_n)=\set{0}$ as all the $e_i$ are in the kernel of $T$. It suffices to show then that $(Tf_1,\ldots,Tf_m)$ is linearly independent.

To this end, suppose we are given $a_i$ for $1\le i\le m$ such that

$$\sum_{i=1}^m a_i Tf_i = 0.$$

Then since $T$ is a linear map, we have that

$$T\left(\sum_{i=1}^m a_i f_i\right) = 0.$$

It follows that

$$\sum_{i=1}^m a_i f_i\in\ker T,$$

and so we can rewrite this uniquely in terms of basis vectors for $\ker T$:

$$\sum_{i=1}^m a_i f_i = \sum_{i=1}^n b_i e_i, \tag{1}$$

where $b_i\in\F$ for $i\le 1\le n$. Since $(e_1,\ldots,e_n,f_1,\ldots,f_m)$ is a basis for $U$, it is linearly independent. Thus, equation $(1)$ implies that $a_i=0$ for $1\le i\le m$ and $b_i=0$ for $1\le i\le n$. Thus, $(Tf_1,\ldots,Tf_m)$ is linearly independent and so it is a basis for $\im T$. It follows that $\dim\im T = m$, and thus

$$\dim U = \dim\ker T + \dim\im T.$$

Essentially what this means is that with any linear map, every dimension of the domain is accounted for. By this, I really mean that every one-dimensional subspace is either squashed down to zero and thus in the kernel, or its linear dependence is preserved and it gets mapped to a unique one-dimensional subspace of the codomain.

The reason this is called the rank-nullity theorem is because we make the following definitions:

Definition. Let $T:U\to V$ denote a linear map. The rank of $T$ is

$$\text{rank }T = \dim\im T.$$

Similarly, the nullity of $T$ is

$$\text{nullity }T = \dim\ker T.$$

The rank-nullity theorem has an interesting implication.

Theorem. If $U$ and $V$ are finite-dimensional vector spaces with $\dim U=\dim V$ and $T:U\to V$ is a linear map. Then $T$ is injective if and only if it is surjective.

Proof. Suppose first that $T$ is injective, so that $\ker T=\set{0}$. Then $\dim\ker T = 0$, so from the rank-nullity theorem it follows that

$$\begin{align}
\dim\im T &= \dim U \\
&= \dim V,
\end{align}$$

and thus $\im T = V$ because the only subspace of a finite-dimensional vector space whose dimension is the same as the parent space is the parent space itself. Thus, $T$ is surjective.

Suppose next that $T$ is surjective. Then

$$\begin{align}
\dim\im T &= \dim V \\
&= \dim U.
\end{align}$$

From the rank-nullity theorem, it follows that $\dim\ker T = 0$. The only subspace with dimension zero is the trivial subspace, and so $\ker T = \set{0}$. Thus, $T$ is injective.

So for two finite-dimensional vector spaces of the same dimension, injective and surjective linear maps are actually the same thing. Remember that a function that is both injective and surjective is called bijective and has an inverse function. As you shall see, all kinds of interesting results follow from this observation.

]]>
<![CDATA[Bases and Dimension of Vector Spaces]]>https://algebrology.github.io/bases-and-dimension-of-vector-spaces/5c914d58bdf47d003ee072b9Tue, 19 Mar 2019 20:45:57 GMT
  1. Bases
  2. Dimension
  3. Some Useful Theorems

Bases

Previously, I discussed span and linear independence. Essentially, the span of a list of vectors is the set of all possible linear combinations of those vectors, and a list of vectors is linearly independent if none of them can be expressed as a linear combination of the others.

Now I'll use these ideas to introduce one of the most important concepts in linear algebra:

Definition. A basis for a vector space $V$ is a linearly independent list of vectors which spans $V$.

This definition, combined with what we have already discussed, tells us three things:

  1. Every vector in $V$ can be expressed as a linear combination of basis vectors, since a basis must span $V$.
  2. This representation is unique, since the basis elements are linearly independent.
  3. $V$ is the direct sum of the spans of the basis vectors.

Stated more explictly, if $(e_1,\ldots,e_n)$ is a basis for a vector space $V$, then for every vector $v\in V$ there is a unique set of scalars $a_i\in\F$ for which

$$v=\sum_{i=1}^n a_ie_i.$$

Bases for vector spaces are similar to bases for topological spaces. The idea is that a basis is a small, easy to understand subset of vectors from which it is possible to extrapolate pretty much everything about the vector space as a whole.

Here are some examples and non-examples.

Example. The list $\big((1,0), (0,1)\big)$ is a basis for the vector space $\R^2$, since the list is linearly independent and for any vector $(x,y)\in\R^2$, we may write

$$(x,y) = x(1,0)+y(0,1).$$

Example. The list $\big((1,0), (1,1)\big)$ is also a basis for $\R^2$. To see that it is linearly independent, we must show that

$$a(1,0) + b(1,1) = (0,0) \tag{1}$$

only if $a=b=0$. Since in this vector space addition and scalar multiplication are done component-wise, equation $(1)$ reduces to the system of equations:

$$\begin{align}
a + b &= 0 \tag{2}\\
\phantom{a +} b &= 0. \tag{3}
\end{align}$$

Equation $(2)$ implies that $a=-b$, from which equation $(3)$ implies that $a=b=0$, as desired.

To see that our list spans $\R^2$, we note that any vector $(x,y)\in\R^2$ can be written

$$(x,y) = (x-y)(1,0) + y(1,1).$$

Example. The list $\big((1,0),(-1,0)\big)$ is not a basis for $\R^2$ because it is linearly dependent and it does not span $\R^2$. To see that it is linearly dependent, note that

$$(1,0)+(-1,0)=(0,0).$$

That is, we have written the zero vector as a linear combination of vectors in the list with nonzero coefficients.

To see that it does not span $\R^2$, note that no vector in $\R^2$ with a nonzero second component can ever be written as a scalar multiple of these vectors, since both have a second component of zero.

Example. The list $\big((1,0),(0,1),(1,1)\big)$ is not a basis for $\R^2$ because it is linearly dependent. We can see this immediately from the linear dependence lemma, since it has length three and we have already exhibited a linearly dependent list of length two which spans $\R^2$.

Example. The list $\big(1,x,x^2,x^3)$ is a basis for $\P_3(\F)$, the vector space of polynomials over a field $\F$ with degree at most three. To justify this claim, note that this list is certainly linearly independent, and any polynomial of degree less than or equal three can be written

$$p(x)=a+b+cx^2+dx^3$$

for some choice of $a,b,c,d\in\F$.

The next theorem is fairly obvious, but without it we would be pretty much lost. Recall that a vector space is finite-dimensional if it is spanned by a finite list of vectors. For this next proof we will use this definition, as well the linear dependence lemma.

Theorem. Every finite-dimensional vector space has a basis.

Let $V$ denote a finite-dimensional vector space. Then $V$ is spanned by some finite list of vectors, not all zero, $(v_1,\ldots,v_n)$. If this list is linearly independent, then we are done. Otherwise, we can use the linear dependence lemma to remove a vector from this list and produce a new list of length $n-1$ which still spans $V$. We continue this process until we are left with a linearly independent list, which will take at most $n-1$ steps, since any list containing one nonzero vector is automatically linearly independent. This resulting list is, by definition, a basis for $V$.

This argument only works for finite-dimensional vector spaces, since it relies on the fact that we only have to apply the linear dependence lemma a finite number of times. Infinite-dimensional vector spaces, as I've mentioned before, are a lot grosser. It is possible to show that infinite-dimensional vector spaces are guaranteed to have bases if we accept the axiom of choice. Since most mathematicians much prefer to have bases for their vector spaces, this is just one more point in favor of accepting the axiom of choice. Luckily for us, we aren't currently interested in infinite-dimensional vector spaces and so we can simply ignore this icky business.

Dimension

Next comes a super important result which will finally settle the question I posed last time about how to define the dimension of a vector space. Its proof will employ the theorem I proved at the end of my last post, that the length of a list of spanning vectors is never shorter than any linearly independent list.

Theorem. All bases of a finite-dimensional vector space have the same length.

Proof. Let $V$ denote a finite dimensional vector space, and suppose $(e_1,\ldots,e_n)$ and $(f_1,\ldots,f_m)$ are both bases for $V$. Since $(e_1,\ldots,e_n)$ is linearly independent and $(f_1,\ldots,f_m)$ spans $V$, we have that $n\le m$. Similarly, since $(f_1,\ldots,f_m)$ is linearly independent and $(e_1,\ldots,e_n)$ spans $V$, we have that $m\le n$. It follows that $m=n$, as desired.

What we have really shown is that the length of a basis for a finite-dimensional vector space is an invariant of that space! And it's a particularly special invariant:

Definition. The dimension of a finite-dimensional vector space is the length of any basis for that space.

If the dimension of a vector space $V$ is $n$, we write

$$\dim V=n.$$

As a special case, recall that we defined $\span()=\set{0}$. That means that $\dim\set{0}=0$.

We've already shown that dimension is well-defined, since all bases for a vector space have the same length. But does this definition coincide with what we expect? For instance, we would certainly expect that $\dim\R^n=n$ for any positive integer $n$. And it turns out that this always the case, as we can see by the following definition.

Definition. Given a positive integer $n$, the standard basis for $\F^n$ is the list containing the vectors

$$\begin{eqnarray}
e_1 &=& (1,0,\ldots,0), \\
e_2 &=& (0,1,\ldots,0), \\
&\;\vdots& \\
e_n &=& (0,0,\ldots,1).
\end{eqnarray}$$

That is, $e_i$ is the vector whose $i$th component is $1$ and every other component is zero.

It should be fairly obvious that for any $n$, the standard basis for $\F^n$ is, in fact, a basis. It is certainly linearly independent and it is easy to write any vector in $\F^n$ as a linear combination of basis vectors, implying it also spans $\F^n$. Since there are $n$ basis vectors in the standard basis, it follows that $\dim\F^n=n$, just like we would expect.

Some Useful Theorems

This first result tells use how to calculate the dimension of a sum of two subspaces. Recall that we previously showed that the intersection of two subspaces is itself a subspace.

Theorem. If $V_1$ and $V_2$ are subspaces of a finite-dimensional vector space, then

$$\dim(V_1+V_2) = \dim V_1 + \dim V_2 - \dim(V_1\cap V_2).$$

Proof. Since $V_1\cap V_2$ is a subspace of a finite-dimensional vector space, it is also finite-dimensional. Thus it has a basis, which we will denote $(e_1,\ldots,e_n)$.

This basis for $V_1\cap V_2$ is linearly independent, and thus we may extend it to a basis for $V_1$ by adding $m_1 = \dim V_1 - n$ new linearly independent vectors. That is, there is some basis $(e_1,\ldots,e_n,f_1,\ldots,f_{m_1})$ for $V_1$, and certainly $\dim V_1 = m_1+n$.

We may similarly extend $(e_1,\ldots,e_n)$ to a basis for $V_2$ by adding $m_2 = \dim V_2 - n$ new linearly independent vectors. That is, there is some basis $(e_1,\ldots,e_n,g_1,\ldots,g_{m_2})$ for $V_2$, and certainly $\dim V_2 = m_2+n$.

We argue that $(e_1,\ldots,e_n,f_1,\ldots,f_{m_1},g_1,\ldots,g_{m_2})$ is a basis for $V_1+V_2$. Certainly

$$V_1,V_2\subseteq\span(e_1,\ldots,e_n,f_1,\ldots,f_{m_1},g_1,\ldots,g_{m_2})$$

and thus

$$V_1+V_2=\span(e_1,\ldots,e_n,f_1,\ldots,f_{m_1},g_1,\ldots,g_{m_2}).$$

To see that this list is linearly independent, suppose that

$$\sum_{i=1}^n a_i e_i + \sum_{i=1}^{m_1} b_i f_i + \sum_{i=1}^{m_2} c_i g_i = 0 \tag{4}$$

for some scalars $\set{a_i}_{i=1}^n$, $\set{b_i}_{i=1}^{m_1}$ and $\set{c_i}_{i=1}^{m_2}$. We can rewrite $(4)$ as

$$\sum_{i=1}^{m_2} c_i g_i = -\sum_{i=1}^n a_i e_i - \sum_{i=1}^{m_1} b_i f_i,$$

from which we see that $\sum_{i=1}^{m_2} c_i g_i\in V_1\cap V_2$. Since $(e_1,\ldots,e_n)$ is a basis for $V_1\cap V_2$, it follows that we may write it uniquely as a linear combination of basis vectors,

$$\sum_{i=1}^{m_2} c_i g_i = \sum_{i=1}^n d_i e_i, \tag{5}$$

for some scalars ${d_i}_{i=1}^n$. But $(e_1,\ldots,e_n,g_1,\ldots,g_{m_2})$ is linearly independent, so it follows from equation $(5)$ that $c_i=0$ for all $i$. This means we may rewrite equation $(4)$ as

$$\sum_{i=1}^n a_i e_i + \sum_{i=1}^{m_1} b_i f_i = 0.$$

Since $(e_1,\ldots,e_n,f_1,\ldots,f_{m_1})$ is linearly independent, it follows that $a_i=0$ for all $1\le i\le n$ and $b_i=0$ for all $1\le i\le m_1$. Since all coefficients in equation $(4)$ have been shown to be zero, it follows that $(e_1,\ldots,e_n,f_1,\ldots,f_{m_1},g_1,\ldots,g_{m_2})$ is a basis for $V_1+V_2$.

It follows then that

$$\begin{align}
\dim(V_1+V_2) &= n + m_1 + m_2 \\
&= (n+m_1) + (n+m_2) - n \\
&= \dim V_1 + \dim V_2 - \dim(V_1\cap V_2),
\end{align}$$

completing the proof.

If you think that proof was gross, try doing it for sums of three or more subspaces. Or don't, because as far as I know there is no general formula for such a thing.

The next result ties together our work from last post regarding direct sums, and follows immediately from the theorem we just proved. Recall that if a sum of two subspaces is direct, their intersection is trivial.

Theorem. If $U_1,\ldots,U_n$ are subspaces of a finite-dimensional vector space $V$ for which

$$V=\bigoplus_{i=1}^n U_i,$$

then

$$\dim V = \sum_{i=1}^n \dim U_i.$$

Proof. We first rewrite the sum as

$$V = (\cdots((U_1\oplus U_2) \oplus U_3) \oplus \cdots \oplus U_n).$$

We work outward, first considering the subspace $W_2 = U_1\oplus U_2$ and then the subspace $W_3 = W_2 \oplus U_3$, then the subspace $W_4 = W_3 \oplus U_4$, etc. This takes $n-1$ iterations, but eventually we reach $W_n = V$.

But $U_1\cap U_2=\set{0}$ since the sum is direct, and so by the previous theorem we see that

$$\begin{align}
\dim W_1 &= \dim (U_1 + U_2) \\
&= \dim U_1 + \dim U_2 - \dim (U_1\cap U_2) \\
&= \dim U_1 + \dim U_2 - \dim\set{0} \\
&= \dim U_1 + \dim U_2 - 0 \\
&= \dim U_1 + \dim U_2.
\end{align}$$

And in general, since $W_j = W_{j-1} \oplus U_j$,

$$\begin{align}
\dim W_j &= \dim (W_{j-1} + U_j) \\[1em]
&= \dim W_{j-1} + \dim U_j \\
&= \sum_{i=1}^{j-1} \dim U_i + \dim U_j.
\end{align}$$

Setting $j=n$, the result follows.

Next time I will introduce linear maps, which are (besides vector spaces themselves) the primary objects of study in linear algebra.

]]>
<![CDATA[Vector Spaces (2) - Direct Sums, Span and Linear Independence]]>https://algebrology.github.io/vector-spaces-2/5c8d3e3bbdf47d003ee07244Sat, 16 Mar 2019 18:24:28 GMT
  1. Direct Sums
  2. Span and Linear Combinations
  3. Linear Independence

Direct Sums

I left off last time with an example of a sum of subspaces with a rather important property. Namely, every vector in the sum had a unique representation as a sum of vectors in the component subspaces. Not all sums have this property, but it is so important that it deserves special consideration.

Definition. Let $U$ and $W$ be subspaces of a vector space $V$ for which $V=U+W$. This sum is a direct sum if for every $\v\in V$, the representation

$$\v=\u+\w,$$

where $\u\in U$ and $\w\in W$, is unique. If a sum is direct, it is expressed symbolically as

$$ V=U\oplus W.$$

Similarly, if $U_1,U_2,\ldots,U_n$ are subspaces of $V$ for which

$$\begin{align}
V &= \sum_{i=1}^n U_i \\
&= U_1 + U_2 + \cdots + U_n,
\end{align}$$

then this sum is direct if for every $\v\in V$, the representation

$$\begin{align}
\v &= \sum_{i=1}^n \u_i \\
&= \u_1 + \u_2 + \cdots + \u_n,
\end{align}$$

where $\u_i\in U_i$ for every $i$, is unique. Such a direct sum is expressed symbolically as

$$\begin{align}
V &= \bigoplus_{i=1}^n U_i \\
&= U_1 \oplus U_2 \oplus \cdots \oplus U_n.
\end{align}$$

Notice that a direct sum of linear subspaces is not really its own thing. It is a normal sum which happens to also have the property of being direct. You do not start with two subspaces and take their direct sum. You take the sum of subspaces, and that sum may happen to be direct.

We have already seen an example of a sum which is direct. You may find it more enlightening if we exhibit an example of a sum which is not direct.

Example. Consider the subspaces

$$\begin{align}
U &= \set{(x,y,0)\in\R^3\mid x,y\in\R}, \\
W &= \set{(0,y,z)\in\R^3\mid y,x\in\R}
\end{align}$$

of the vector space $\R^3$. Clearly $\R^3=U+W$. The vector $(1,1,1)\in\R^3$ can be expressed as

$$(1,1,1)=(1,1,0)+(0,0,1),$$

where $(1,1,0)\in U$ and $(0,0,1)\in W$. However, it can also be expressed as

$$(1,1,1)=(1,2,0)+(0,-1,1),$$

where $(1,2,0)\in U$ and $(0,-1,1)\in W$. In fact, there are infinitely many ways to decompose this into such a sum — the numbers in the second components just need to add up to $1$. Since the representation of $(1,1,1)$ as a sum of a vector in $U$ and a vector in $W$ is not unique, it follows that the sum $R^3=U+W$ is not direct.

It is sometimes difficult to determine whether a sum is direct using only the definition. This would require some sort of insight as to why it must be direct, or the demonstration of a counterexample. This next proposition gives us a somewhat simplified method for checking whether a sum of subspaces is direct.

Proposition. Suppose $U_1,U_2,\ldots,U_n$ are subspaces of a vector space $V$ for which

$$V=\sum_{i=1}^n U_i.$$

Then

$$V=\bigoplus_{i=1}^n U_i$$

if and only if the choices of $\u_i\in U_i$ which satisfy

$$\0=\u_1+\u_2+\cdots+\u_n$$

are the vectors $\u_1=\u_2=\cdots=\u_n=\0$.

Proof. First, suppose that

$$V=\bigoplus_{i=1}^n U_i,$$

and for each $i$ choose $\u_i\in U_i$ so that

$$\0=\u_1+\u_2+\cdots+\u_n.$$

Because the above sum is direct, this representation of $\0$ must be unique. It must be the case then that $\u_1=\u_2=\cdots=\u_n=\0$.

Suppose next that the only way to satisfy

$$\0=\u_1+\u_2+\cdots+\u_n$$

where $\u_i\in U_i$ for each $i$, is by taking $\u_1=\u_2=\cdots=\u_n=\0$. For any $\v\in V$, we may write

$$\v=\v_1+\v_2+\cdots+\v_n, \tag{1}$$

where $\v_i\in U_i$ for each $i$. Suppose that we can also write

$$\v=\w_1+\w_2+\cdots+\w_n, \tag{2}$$

where $\w_i\in U_i$ for each $i$ Subtracting $(2)$ from $(1)$ yields

$$\0=(\v_1-\w_1)+(\v_2-\w_2)+\cdots+(\v_n-\w_n).$$

For each $i$, we have that $\v_i-\w_i\in U_i$, since each $U_i$ is a subspace and thus closed under vector addition. Thus, since we have assumed $\0$ to have a unique expression of this form, it follows that $\v_i-\w_i=\0$ for all $i$. That is, $\v_i=\w_i$ for all $i$. We have shown that the expression for $\v$ is unique, and thus

$$V=\bigoplus_{i=1}^n U_i.$$

This result tells us that all we need to do in order to check that a sum of subspaces is direct is to make sure that there is only one way to express the zero vector as a sum of vectors from each subspace. This is certainly simpler than checking that every vector has a unique representation of this form. However, in the case of two subspaces, we can do even better.

Corollary. Suppose $U$ and $W$ are subspaces of a vector space $V$ and that $V=U+W$. Then $V=U\oplus W$ if and only if $U\cap W=\set{\0}$.

Proof. Suppose first that $V=U\oplus W$ and choose $\v\in U\cap W$. Then $\v\in U$ and $\v\in W$, so we also have that $-\v\in W$ because $W$ is a subspace of $V$ and so it is closed under scalar multiplication (in this case by $-1$). We can thus write

$$\0=\v+(-\v),$$

and this expression is unique by the above proposition. It follows that $\v=-\v-\0$, and thus $U\cap W =\set{\0}$.

Next, suppose that $U\cap W=\set{\0}$ and let

$$ \0=\u+\w,$$

where $\u\in U$ and $\w\in W$. Then $\u=-\w$ and so $-\w\in U$, which implies that $\w\in U$. Since $\w\in U$ and $w\in W$, it follows that $\w\in U\cap W$. By hypothesis, this means that $\w=\0$ and thus $\u=\0$. We have shown that the only way to decompose $\0$ in the form above is as the sum of zero vectors. Thus, it follows from the above proposition that $V=U\oplus W$, completing the proof.

This result tells us that the sum of two subspaces is direct whenever their intersection is trivial. Unfortunately, this result does not generalize to sums of three or more subspaces. In fact, in the case of three or more subspaces, even pairwise intersections being trivial is not enough to guarantee that a sum is direct.

Span and Linear Combinations

Given a vector space $V$ and a vector $\v\in V$, what is the smallest subspace of $V$ containing $\v$? It's not a trick question — the answer is actually somewhat obvious. It's the set $U=\set{a\v\in V\mid a\in\F}$ of all scalar multiples of $\v$. It is certainly clear that $U$ satisfies the criteria of a subspace. Furthermore, the only proper subspace of $U$ is $\set{\0}$, so it is certainly the smallest subspace of $V$ which contains $\v$. This is a very important idea, and it leads us to our next definition.

Definition. If $V$ is a vector space with $\v\in V$, then the span of $\v$ is the subspace

$$\span\v = \set{a\v\in V\mid a\in\F}.$$

Furthermore, if $(\v_1,\ldots,\v_n)$ is a list of vectors in $V$, then the span of $(v_1,\ldots,v_n)$ is the subspace

$$\begin{align}
\span(\v_1,\ldots,\v_n) &= \sum_{i=1}^n \span\v_i \\
&= \sum_{i=1}^n \set{a\v_i\in V\mid a\in\F} \\
&= \bset{\sum_{i=1}^n a\v_i\in V\mid a\in\F}.
\end{align}$$

For consistency, we define the span of the empty list to be

$$\span()=\set{\0}.$$

If $V=\span(\v_1,\ldots,\v_n)$, we say that $V$ is spanned by the list $(\v_1,\ldots,\v_n)$, or that this list spans $V$.

It is important to distinguish between lists and sets. In a list, the order of the entries matters and entries can be repeated more than once.

In the above definition, we first defined the span of one vector to be the smallest subspace containing that vector. We then defined the span of a list of vectors to be the sum of the spans of each vector in the list.

Equivalently, the span of a list of vectors is a subspace containing every possible sum of scalar multiples of those vectors. In order to avoid having to repeat the phrase "sum of scalar multiples" over and ove, we define a new term which means the same thing.

Definition. Let $V$ be a vector space with $\v_1,\v_2,\ldots,\v_n\in V$. A linear combination of the vectors $\v_1,\v_2,\ldots,\v_n$ is an expression of the form

$$\sum_{i=1}^n a_i\v_i = a_1\v_1+a_2\v_2+\cdots+a_n\v_n,$$

where $a_1,a_2,\ldots,a_n\in\F$.

Example. Consider the vectors $(1,2)$, $(3,4)$ and $(0,-1)$ in the vector space $\R^2$. Some linear combinations of these vectors include

$$\begin{align}
(9,9)&=3(1,2)+2(3,4)+5(0,-1), \\
(-1,0)&=2(1,2)-\phantom{1}(3,4)+0(0,-1), \\
(1,2)&=\phantom{1}(1,2)+0(3,4)+0(0,-1).
\end{align}$$

With this definition, we can now think of the span of a list of vectors as the set of all linear combinations of those vectors. In addition, we can now see that a vector space $V$ is spanned by a list of vectors if every vector in $V$ can be expressed as a linear combination of vectors in that list.

Although we are not quite ready to define the dimension of a vector space, we now have the necessary machinery to distinguish between finite-dimensional and infinite-dimensional vector spaces.

Definition. A vector space $V$ is finite-dimensional if it is spanned by a finite list of vectors in $V$.

If no finite list of vectors spans $V$, then $V$ is infinite-dimensional.

We will be primarily concerned with finite-dimensional vector spaces. These have much nicer properties and are considerably easier to work with and conceptualize. Infinite-dimensional vector spaces can be extremely poorly behaved and are dealt with in a branch of mathematics called functional analysis.

Example. For any positive integer $n$, the vector space $\R^n$ is finite-dimensional. This is because

$$\begin{align}
\R &= \span(1), \\
\R^2 &= \span\big((1,0), (0,1)\big), \\
\R^3 &= \span\big((1,0,0), (0,1,0), (0,0,1)\big), \\
\R^4 &= \span\big((1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1)\big), \\
&\;\; \vdots
\end{align}$$

Of course, there are other lists of vectors that span each $\R^n$, but to show that a vector space is finite-dimensional, we need only demonstrate that one such list exists.

Example. We have already been introduced to an infinite-dimensional vector space, namely $\P(\F)$. This is the set of polynomials with coefficients in some field $\F$.

To show that it is infinite-dimensional, consider any finite list of polynomials in $\P(\F)$ and let $n$ be the highest degree of the polynomials in the list. Then the polynomial $x^{n+1}$ is not in the span of this list, so it follows that no finite list of polynomials spans $\P(\F)$.

Example. On the other hand, the vector space $\P_n(\F)$ of polynomials whose degree is at most $n$ is finite-dimensional.

It is tempting to try to define the dimension of a finite-dimensional vector space using the machinery we have already developed. We could, perhaps, attempt to define the dimension of $V$ as the length of a list of vectors which spans $V$. Unfortunately, this is not well-defined because a vector space can have many spanning lists of different lengths. For example,

$$\begin{align}
\R^2 &= \span\big((0,1),(1,0)\big) \\
&= \span\big((0,1),(1,0),(1,0)\big) \\
&= \span\big((1,2),(2,3),(3,4),(4,5)\big),
\end{align}$$

because any vector in $\R^2$ can be expressed as a linear combination of vectors in each list. In fact, we can always take a list which spans a vector space $V$ and add to it any vector in $V$, resulting in a new list which also spans $V$. This means that the length of a spanning list is certainly not unique, and so we cannot define dimension in this way.

Linear Independence

Intuition tells us that we are on the correct track, though. It should be somewhat obvious that the vector space $\R^2$ cannot possibly be spanned by any list containing less than two vectors. If this is truly the case, it should make sense to define the dimension of $\R^2$ to be $two$. We will therefore work toward figuring out the minimum length of a spanning list of vectors. The important concept of linear independence, which we will now define, will help us with this.

Definition. A list $(\v_1,\v_2,\ldots,\v_n)$ of vectors in a vector space $V$ is linearly independent if the only choices of $a_i\in F$ which satisfy

$$\sum_{i=1}^n a_i\v_i = \0 \tag{3}$$

are the scalars $a_1=a_2=\cdots=a_n=0$.

If there exist $a_i\in\F$, not all zero, for which $(3)$ holds, then the list $(\v_1,\v_2,\ldots,\v_n)$ is linearly dependent.

In every vector space, any list containing the zero vector must be linearly dependent. This is because we can take the coefficient of $\0$ to be any scalar.

Let's look at some more interesting examples.

Example. In $\R^2$, the list

$$\big((0,1),(1,0)\big)$$

is linearly independent because

$$a_1(0,1)+a_2(1,0)=(0,0)$$

only if $a_1=a_2=0$. No matter how hard we try, we cannot come up with any other linear combination of these vectors which yields the zero vector.

However, if we add $(1,0)$ to the list, the resulting list

$\big((0,1),(1,0),(1,0)\big)$$

is linearly dependent because

$$0(0,1)+(1,0)-(1,0)=(0,0).$$

Example. As a less trivial example, the list

$$\big((1,2),(2,3),(3,4),(4,5)\big)$$

is also linearly dependent because

$$(1,1)+(2,3)-5(3,4)+3(4,5)=(0,0).$$

Up until now, we have sort of been weaving a twisted web of interconnected ideas. Our next theorem relates them all and gives an alternate characterization of linear independence in terms of spans and direct sums.

Theorem. A list $(\v_1,\v_2,\ldots,\v_n)$ of nonzero vectors in a vector space $V$ is linearly independent if and only if

$$\span(\v_1,\v_2,\ldots,\v_n) = \bigoplus_{i=1}^n \span\v_i.$$

Proof. First, suppose that the list $(\v_1,\v_2,\ldots,\v_n)$ is linearly independent. Then by the definition of span, we have that

$$\span(\v_1,\v_2,\ldots,\v_n) = \sum_{i=1}^n \span\v_i.$$

From our earlier proposition, we need to show that only choices of $\u_i\in\span\v_i$ for which

$$\0=\u_1+\u_2+\cdots+\u_n$$

are the vectors

$$\u_1=\u_2=\cdots=\u_n=\0.$$

Since $u_i\in\span\v_i$ for each $i$, we have that $\u_i=a_i\v_i$ for some $a_i\in\F$. Since $(\v_1,\v_2,\ldots,\v_n)$ is linearly independent, we have that

$$\begin{align}
\0 &= a_1\v_1+a_2\v_2+\cdots+a_n\v_n \\
&= \u_1+\u_2+\cdots+\u_n
\end{align}$$

only if $a_1=a_2=\cdots=a_n=0$, which implies that $\u_i=\0$ for all $i$. If follows then that the desired sum is direct.

Suppose next that

$$\span(\v_1,\v_2,\ldots,\v_n) = \bigoplus_{i=1}^n \span\v_i.$$

Again from the above proposition, we know that the only choices of $\u_i\in\span\v_i$ which satisfy

$$\u_1=\u_2=\cdots=\u_n=\0.$$

Since each $\u_i\in\span\v_i$, it follows that for each $i$, there is some $a_i\in\F$ for which $\u_i=a_i\v_i$. Since each $\v_i\ne 0$, it must be that each $a_i=0$. Thus, the list $(\v_1,\v_2,\ldots,\v_n)$ is linearly independent.

This theorem establishes a somewhat different way of thinking about linear independence, and since the two are equivalent we are free to use either as a definition of linear independence.

Linearly depdendent lists have an interesting, yet fairly undesirable, property which is explained in the next theorem. This result is commonly called the linear dependence lemma, and it will be crucial for many future arguments. It states than any linearly dependent list contains a vector which is in the span of the previous vectors in the list. Furthermore, it asserts that removing this vector from the list does not affect the list's span.

Linear Dependence Lemma/Steinitz Exchange Lemma. Suppose $(\v_1,\v_2,\ldots,\v_n)$ is a linearly dependent list of vectors in a vector space $V$ for which $\v_1\ne\0$. Then there exists a natural number $i\ge 2$ for which

$(a) \;\;$ $\v_i\in\span(\v_1,\v_2,\ldots,\v_n)$,
$(b) \;\;$ removing $\v_i$ from $(\v_1,\v_2,\ldots,\v_n)$ results in a list with the same span.

Proof. Since $(\v_1,\v_2,\ldots,\v_n)$ is linearly dependent, there exist $a_1,a_2,\ldots,a_n\in\F$, not all zero, for which

$$a_1\v_1+a_2\v_2+\cdots+a_n\v_n = 0.$$

We have assumed that $\v_1\ne\0$, so it is possible that $a_1=0$, but not all of $a_2,a_3,\ldots,a_n$ can equal zero. Let $i$ be the largest index for which $a_i\ne 0$. Then

$$a_1\v_1+a_2\v_2+\cdots+a_i\v_i = 0,$$

so we may write

$$\v_i = -\frac{a_1}{a_i}\v_1-\frac{a_2}{a_i}\v_2 - \cdots - \frac{a_{i-1}}{a_i}\v_{i-1}.$$

Since we have expressed $\v_i$ as a linear combination of $\v_1,\v_2,\ldots,\v_{i-1}$, it follows that $\v_i\in\span(\v_1,\v_2,\ldots,\v_n)$. This completes the proof of $(a)$.

Next, choose any $\u\in\span(\v_1,\v_2,\ldots,\v_n)$. By definition, there exist $a_1,a_2,\ldots,a_n\in\F$ for which

$$\u=a_1\v_1+a_2\v_2+\cdots+a_n\v_n.$$

Using the result of $(a)$, we can replace $v_i$ in the above equation with a linear combination of the vectors $\v_1,\v_2,\ldots,\v_{i-1}$. Thus, $\u$ is also in the span of the list which results from removing $\v_i$.

Now we will establish that linearly independent lists possess the property I mentioned earlier. Namely, we will show that linearly independent lists are the shortest possible spanning lists.

Theorem. In any finite-dimensional vector space $V$, the length of a list of vectors which spans $V$ is never shorter than any linearly independent list.

Proof. Suppose $(\u_1,\u_2,\ldots,\u_m)$ is a list of vectors which spans $V$, and that $(\v_1,\v_2,\ldots,\v_n)$ is a linearly independent list. This implies $\v_i\ne\0$ for every $i$. We will show that $m\ge n$ through the following algorithmic construction.

Step 1.
Because $(\u_1,\u_2,\ldots,\u_m)$ spans $V$, adding any vector to this list must result in a list which is linearly dependent. Namely, the list $L_{1,1}=(\v_1,\u_1,\u_2,\ldots,\u_m)$ is linearly dependent. It follows from the linear dependence lemma that there exists a positive integer $i\le m$ for which the removal of $\u_i$ from $L_{1,1}$ results in a new list $L_{1,2}$ which still spans $V$. Notice that $L_{1,2}$ is a list of length $m$. To simplify matters, we replace the index of every $\u_j$ for which $j>i$ with $j-1$, so that we can write

$$L_{1,2}=(\v_1,\u_1,\u_2,\ldots,\u_{m-1}).$$

Step k.
Because the list $L_{k-1,2}$ from the previous step spans $V$, adding any vector to this list must result in a list which is linearly dependent. Namely, the list $L_{k,1}$ formed by adding $\v_k$ to the beginning of $L_{k-1,1}$ is linearly dependent. It follows from the linear dependence lemma that we can remove some vector from $L_{k,1}$ without affecting its span. Since

$$L_{k,1}=(\v_k,\v_{k-1},\ldots,\v_1,\u_1,\u_2,\ldots,\u_{m-k+1})$$

and $(\v_k,\v_{k-1},\ldots,\v_1)$ is linearly independent, there must exist a positive integer $i\le m-k+1$ for which the removal of $\u_i$ from $L_{k,1}$ results in a new list $L_{k,2}$ which still spans $V$. Notice that $L_{k,2}$ is a list of length $m$. Next, we replace the index of every $\u_j$ for which $j>i$ with $j-1$, so that we can write

$$L_{k,2}=(\v_k,\v_{k-1},\ldots,\v_1,\u_1,\u_2,\ldots,\u_{m-k}).$$

At the end of step $n$, we have used up every $\v_i$ and so the algorithm terminates. It cannot terminate before this because each list $K_{k,1}$ is linearly dependent, and so $m-n\ge 0$. We conclude that $m\ge n$, as desired.

Algorithmic proofs like the above are common in linear algebra. Linear algebra has applications in almost every field, partially because it lends itself so nicely to computation.

I will end here for now, and next time I will talk about bases for vector spaces.

]]>
<![CDATA[Vector Spaces (1) - Subspaces and Sums]]>https://algebrology.github.io/vector-spaces-1/5c8c6f0bfdce06003e4aeff0Sat, 16 Mar 2019 04:04:40 GMT
  1. Introduction
  2. Vector Spaces
  3. Basic Properties
  4. Examples
  5. Subspaces and Sums

Introduction

It's almost ridiculous that I would expose you to free abelian groups before talking about vector spaces and linear algebra. Vector spaces and free abelian groups have a lot in common, but vector spaces are more familiar, more ubiquitous and easier to compute with.

At their core, vector spaces are very simple and their definition will closely mimic that of groups and topological spaces. Recall that a group is a set with a binary operation, an identity and inverses for all its elements. A topological space is a set and a collection of "open sets" which include the set itself, the empty set, finite intersections and arbitrary unions of open sets. Vector spaces are defined in a similar manner.

A vector space is a special kind of set containing elements called vectors, which can be added together and scaled in all the ways one would generally expect.

You have likely encountered the idea of a vector before as some sort of arrow, anchored to the origin in euclidean space with some well-defined magnitude and direction.

vector-arrow

This is the sort of vector encountered in introductory physics classes. However, such arrows are not the only mathematical objects that can be added and scaled, so it would be silly to restrict our attention only to them. We will make a more abstract and inclusive definition of vector spaces, which are the main objects of study in linear algebra.

We would like our definition to include some way to scale vectors so that we can expand of shrink them in magnitude while preserving their direction. Since in general there is no concept of vector multiplication, we will need to bring in additional elements by which we are allowed to multiply our vectors to achieve a scaling effect. These scalars will be the elements of a field, which we have encountered before in my posts on constructing the rational numbers, but I will give the definition again because it is so important.

Definition. A field is a set $\F$ of elements called scalars, together with two binary operations:

Addition
which assigns to any pair of scalars $a,b\in\F$ the scalar $a+b\in\F$,

Multiplication
which assigns to any pair of scalars $a,b\in\F$ the scalar $ab\in\F$.

Any field and its operations must satisfy the following properties:

Additive Identity
There exists $0\in\F$ such that $0+a=a$ for every $a\in\F$.

Additive Inverses
For every $a\in\F$, there exists $b\in\F$ for which $a+b=0$.

Commutative Property of Addition
For all $a,b\in\F$, we have that $a+b=b+a$.

Associative Property of Addition
For all $a,b,c\in\F$, we have that $(a+b)+c=a+(b+c)$.

Multiplicative Identity
There exists $1\in\F$, with $1\ne 0$, such that $1a=a$ for every $a\in\F$.

Multiplicative Inverses
For every $a\in\F$ with $a\ne 0$, there exists $b\in\F$ for which $ab=1$.

Commutative Property of Multiplication
For all $a,b\in\F$, we have that $ab=ba$.

Associative Property of Multiplication
For all $a,b,c\in\F$, we have that $(ab)c=a(bc)$.

Distributive Property
For all $a,b,c\in\F$, we have that $a(b+c)=ab+ac$.

It is not too difficult to verify that the set $\R$ of real numbers is a field when considered with the usual addition and multiplication. Similarly, the sets $\C$ of complex numbers and $\Q$ of rational numbers form fields when equipped with their usual arithmetic.

There are other examples of fields besides these familiar ones. For example, the set $\Z_3$ of integers modulo $3$ is a field when considered with addition and multiplication modulo $3$. The tables below describe addition and multiplication in $\Z_3$, so you can check for yourself that the field axioms hold.

Z_3-tables

In fact, there are many more examples of finite fields. For any prime number $p$, the set $\Z_p$ of integers modulo $p$ forms a field. However, for our purposes we will usually only be interested in the fields $\R$ and $\C$.

Vector Spaces

With all of this in mind, we can move forward with defining the concept of a vector space over a field. This definition ensures that vectors interact with scalars and other vectors in a reasonable way.

Definition. A vector space over a field $\F$ is a set $V$ of elements called vectors, together with two operations:

Vector Addition
which assigns to any pair of vectors $\vec{u},\vec{v}\in V$ the vector $\vec{u}+\vec{v}\in V$,

Scalar Multiplication
which assigns to any scalar $a\in\F$ and any vector $\vec{v}\in V$ the vector $a\vec{v}\in V$.

Any vector space and its operations must satisfy the following properties:

Zero Vector
There exists $\vec{0}\in V$ such that $\vec{0}+\vec{v}=\vec{v}$ for every $\vec{v}\in V$.

Additive Inverses
For every $\vec{u}\in V$, there exists $\vec{v}\in V$ for which $\vec{u}+\vec{v}=\vec{0}$.

Commutative Property of Addition
For all $\vec{u},\vec{v}\in V$, we have that $\vec{u}+\vec{v}=\vec{v}+\vec{u}$.

Associative Property of Addition
For all $\vec{u},\vec{v}, \vec{w}\in V$, we have that $(\vec{u}+\vec{v})+\vec{w}=\vec{u}+(\vec{v}+\vec{w})$.

Compatibility with Field Multiplication
For all $a,b\in\F$ and $\vec{v}\in V$, we have that $(ab)\vec{v}=a(b\vec{v})$.

Scalar Multiplicative Identity
For every $\vec{v}\in V$, we have that $1\vec{v}=\vec{v}$.

First Distributive Property
For all $a,b\in\F$ and $\vec{v}\in V$, we have that $(a+b)\vec{v}=a\vec{v}+b\vec{v}$.

Second Distributive Property
For all $a\in\F$ and $\vec{u},\vec{v}\in V$, we have that $a(\vec{u}+\vec{v})=a\vec{u}+a\vec{v}$.

Although the choice of field is important when defining a particular vector space, it is often ignored when talking generally about vector spaces. This is because many results hold true for all vector spaces, regardless of the field over which they are defined. Whenever this information is important, it will be specified. Otherwise, we will often refer simply to a vector space $V$, with the understanding that some field $\F$ is lurking in the background.

Basic Properties

Now we will verify five basic facts that are true of all vector spaces. They may seem obvious, and many of them closely mimic analogous results we've already seen in group theory. Nonetheless, without proof we could not in good faith use them in our later arguments.

The first important property is that a vector space only has one zero vector.

Theorem. In any vector space $V$, the zero vector is unique.

Proof. Suppose there exist two zero vectors, $\vec{0}_1,\vec{0}_2\in V$. Then, using the definition of a zero vector and the commutative property, we have that

\begin{align}
\vec{0}_1 &= \vec{0}_1+\vec{v}, \\
\vec{0}_2 &= \vec{v}+\vec{0}_2,
\end{align}

for every $\vec{v}\in V$. In light of these equations,

\begin{align}
\vec{0}_1 &= \vec{0}_1 + \vec{0}_2 \\
&= \vec{0}_2.
\end{align}

It follows that the zero vector is unique.

The next fact that we will prove is that each vector in a vector space has only one additive inverse. Its proof is extremely similar to the above.

Theorem. In any vector space $V$, every vector $\vec{u}\in V$ has a unique additive inverse.

Proof. Suppose $\vec{u}$ has two additive inverses, $\vec{v}_1,\vec{v}_2\in V$. Then, using the definition of an additive inverse and the commutative property, we have that

\begin{align}
\vec{v}_1+\vec{u} &= \vec{0}, \\
\vec{u}+\vec{v}_2 &= \vec{0}.
\end{align}

In light of these equations,

\begin{align}
\vec{v}_1 &= \vec{v}_1 + \vec{0} & \scriptstyle\textit{zero vector}\\
&= \vec{v}_1 + (\vec{u}+\vec{v_2}) & \scriptstyle\textit{additive inverses}\\
&= (\vec{v}_1 + \vec{u}) + \vec{v}_2 & \scriptstyle\textit{associativity}\\
&= \vec{0} + \vec{v}_2 & \scriptstyle\textit{additive inverses}\\
&= \vec{v}_2. & \scriptstyle\textit{zero vector}
\end{align}

It follows that $u$ has a unique additive inverse.

Because of this theorem, no confusion arises if we write $-\vec{v}$ to denote the additive inverse of $V$. We will adopt this notation for the rest of time.

We will not take the time to do this, but it should be clear how to modify the above two proofs to show that in any field $\F$, additive and multiplicative identities are unique, as well as additive and multiplicative inverses.

Next, we show that the scalar product of a field's additive identity $0$ with any vector yields the zero vector.

Theorem. In any vector space $V$, we have that $0\vec{v}=\vec{0}$ for every $\vec{v}\in V$.

Proof. We proceed via the following computation:

\begin{align}
0\vec{v} + 0\vec{v} &= (0+0)\vec{v} & \scriptstyle\textit{first distributive property}\\
&= 0\vec{v} & \scriptstyle\textit{additive identity}\\
&= 0\vec{v} + \vec{0}. & \scriptstyle\textit{zero vector}
\end{align}

Adding $-0\vec{v}$ to both sides yields $0\vec{v} = \vec{0}$, the desired equality.

It is very important to realize that in the above proof, the scalar $0$ on the left is the additive identity in the underlying field $\F$, while the vector $\vec{0}$ on the right is the zero vector in the vector space $V$. Remember that we do not have any concept of multiplying vectors.

We will now prove a result of similar obviousness, whose proof is similar to the above.

Theorem. In any vector space $V$ over a field $\F$, we have that $a\vec{0}=\vec{0}$ for every $a\in\F$.

Proof. We proceed via the following computation:

\begin{align}
a\vec{0} + a\vec{0} &= a(\vec{0}+\vec{0}) & \scriptstyle\textit{second distributive property}\\
&= a\vec{0} & \scriptstyle\textit{zero vector}\\
&= a\vec{0} + \vec{0}. & \scriptstyle\textit{zero vector}
\end{align}

Adding $-a\vec{0}$ to both sides yields $a\vec{0} = \vec{0}$, the desired equality.

There is one obvious fact left to prove, namely that the scalar product of $-1$ with any vector yields the additive inverse of that vector.

Theorem. In any vector space $V$, we have that $-1\vec{v}=-\vec{v}$ for every $\vec{v}\in V$.

Proof. We proceed via the following computation:

\begin{align}
-1\vec{v}+\vec{v} &= -1\vec{v}+1\vec{v} & \scriptstyle\textit{multiplicative identity}\\
&= (-1+1)\vec{v} & \scriptstyle\textit{first distributive property}\\
&= 0\vec{v} & \scriptstyle\textit{additive inverses}\\
&= \vec{0}. & \scriptstyle\textit{zero vector}
\end{align}

Adding $-\vec{v}$ to both sides yields $-1\vec{v} = -\vec{v}$, the desired equality.

Examples

We are now armed with a number of facts about abstract vector spaces and their interactions with scalars, but we have yet to exhibit a single actual example of a vector space. We will now examine some vector spaces that are important throughout the entire subject of linear algebra.

Example. It isn't too hard to see that the set $\{\0\}$, which contains only the zero vector, is a vector space over any field. We are forced to define vector addition and scalar multiplication in the only possible way, and $\0$ must act as both the zero vector and its own additive inverse. This is analogous to the trivial group in group theory, and is not a particularly interesting example.

Example. For any field $\F$, it happens that $\F$ is a vector space over itself, taking vector addition to be the same as field addition and scalar multiplication to be the same as field multiplication. This, again, is easy to check directly from the definitions. As specific examples, $\R$ and $\C$ are both vector spaces over themselves.

For the next example, recall the definition of cartesian product. In particular, recall that $\F^n$ is the set of all ordered tuples of length $n$ with components in $\F$. For instance, $(i, 3+2i, 4)$ is an element of $\C^3$ and $(0, -\pi, 8, \sqrt{2})$ is an element of $\R^4$.

Example. For any field $\F$, the set $\F^n$ is a vector space over $\F$ if we define vector addition and scalar multiplication in the following (obvious) way.

Given $\x=(x_1,x_2,\ldots,x_n)$ and $\y=(y_1,y_2,\ldots,y_n)$ in $\F^n$, we define addition component-wise in terms of field addition as follows:

$$\begin{align}
\x+\y &= (x_1,x_2,\ldots,x_n)+(y_1,y_2,\ldots,y_n) \\
&= (x_1+y_1,x_2+y_2,\ldots,x_n+y_n).
\end{align}$$

Similarly, given $a\in\F$ and $\x=(x_1,x_2,\ldots,x_n)\in\F^n$, we define scalar multiplication component-wise in terms of field multiplication as follows:

$$\begin{align}
a\x &= a(x_1,x_2,\ldots,x_n) \\
&= (ax_1,ax_2,\ldots,ax_n).
\end{align}$$

It is not terribly difficult to show that $\F^n$ does in fact constitute a vector space over $\F$ with these operations. For instance, clearly $\0=(0,0,\ldots,0)$ acts as the zero vector in $\F^n$, and $-\x=(-x_1,-x_2,\ldots,-x_n)$ acts as the additive inverse of $\x=(x_1,x_2,\ldots,x_n)$.

Our next example as a vector space may be slightly surprising at first, but it is both highly important and an excellent example of a vector space whose vectors are certainly not arrows of any kind. First, we will need the following definition.

Definition. Let $\F$ denote a field and let $n$ be a natural number. A polynomial with coefficients $a_0,a_1,\ldots,a_n$ in $\F$ is a function $p:\F\to\F$ of the form

$$\begin{align}
p(x) &= \sum_{i=0}^n a_ix^i \\
&= a_0 + a_1x+a_2x^2+\cdots+a_nx^n
\end{align}$$

for all $x\in\F$. If $a_n\ne 0$, we say that $n$ is the degree of $p$. We write $\mathscr{P}(\F)$ to denote the set of all polynomials with coefficients in $\F$.

As a concrete example, the function $p:\R\to\R$ defined by

$$p(x)=3+2x+8x^2$$

is a polynomial of degree two with coefficients in $\R$, namely $a_0=3$, $a_1=2$ and $a_2=8$. It is thus an element of $\mathscr{P}(\R)$.

If $p(x)=0$ for all $x\in\F$, we call $p$ the zero polynomial, whose degree is undefined. However, for the purposes of adding and multiplying polynomials, it is sometimes useful to formally treat the degree of the zero polyomial as $-\infty$.

Example. It is easy to see that for any field $\F$, the set $\mathscr{P}(\F)$ forms a vector space over $\F$ when equipped with the following definitions of vector addition and scalar multiplication.

For any two vectors $\p,\q\in\mathscr{P}(\F)$, we define their sum $\p+\q\in\mathscr{P}(\F)$ in terms of function addition. That is,

$$(\p+\q)(x)=\p(x)+\q(x)$$

for all $x\in\F$. Similarly, for any $a\in\F$ and $\p\in\mathscr{P}(\F)$, we define scalar multiplication as expected:

$$(a\p)(x)=a\p(x)$$

for all $x\in\F$. It is clear that the zero polynomial must act as the zero vector. Again, checking that $\mathscr{P}(\F)$ is actually a vector space with these definitions is easy but fairly time consuming, so I won't do it here.

Subspaces and Sums

It often happens that a vector space contains a subset which also acts as a vector space under the same operations of addition and scalar multiplication. For instance, the vector space $\{\0\}$ is a (fairly boring) subset of any vector space. This phenomenon is so important that we give it a name.

Definition. A subset $U$ of a vector space $V$ is a subspace of $V$ if it is itself a vector space under the same operations of vector addition and scalar multiplication.

It should go without saying that $U$ and $V$ are defined over the same field.

Note the unfortunate naming conflict with subspaces of topological spaces. Normally they are discussed in different contexts and so this causes no confusion. In cases where confusion may arise, we will sometimes refer to them as topological subspaces and linear subspaces.

Using this definition, certainly $\{\0\}$ constitutes a subspace of every vector space. At the other extreme, every vector space is a subspace of itself (because every set is a subset of itself). However, the concept of a subspace wouldn't be very interesting if these were the only possibilities. Before demonstrating some nontrivial proper subspaces, we provide a more straightforward method for determining whether a subset of a vector space constitutes a subspace.

Proposition. A subset $U$ of a vector space $V$ is a subspace of $V$ if and only if the following three conditions hold:

  1. The zero vector is in $U$.
  2. The set $U$ is closed under addition of vectors.
  3. The set $U$ is closed under scalar multiplication.

Proving this would consist largely of statements like "the associative property holds in $U$ because it holds in $V$." Therefore, I will omit the proof of this proposition.

Example. Consider the set $U=\{(x,0\in\F^2\mid x\in\F\}$, where $\F$ is any field. This is the set of all ordered pairs with entries in $\F$ whose second component is the additive identity.

The zero vector $(0,0)$ is certainly in $U$. From the definition of addition in $\F^2$, the sum of two vectors in $U$ must be of the form

$$(x,0) + (y,0) = (x+y,0)$$

which is certainly in $U$. Similarly, the scalar product of any $a\in\F$ with any vector in $U$ must be of the form

$$a(x,0)=(ax,0)$$

which is also in $U$. We have shown that all three conditions are met, so $U$ is a subspace of $\F^2$.

Example. If $\F$ is a field and $n$ is a natural number, then the set of all polynomials whose degree is at most $n$ forms a subspace of $\P(\F)$. We denote this subspace $\P_n(\F)$.

To see that this is truly a subspace, we will verify the three conditions of the above proposition. We have already stated that the degree of the zero polynomial is $-\infty$, which is certainly less than $N$, so $\0\in\P_n(\F)$.

Next, suppose that $\p,\q\in\P_n(\F)$. Then there exist degrees $j,k\le n$ and coefficients $a_0,\ldots,a_j,b_0,\ldots,b_k\in\F$ for which

$$\begin{align}
\p(x) &= \sum_{i=0}^j a_ix^i, \\
\q(x) &= \sum_{i=0}^k b_ix^i
\end{align}$$

for every $x\in\F$. Since $j$ is the degree of $\p$ and $k$ is the degree of $\q$, we know by definition that $a_j,b_k\ne 0$. Suppose without loss of generality that $j\le k$. It follows that

$$\begin{align}
(\p+\q)(x) &= \sum_{i=0}^j a_ix^i + \sum_{i=0}^k b_ix^i \\
&= \sum_{i=0}^j (a_i+b_i)x^i + \sum_{i=j+1}^k b_ix^i
\end{align}$$

for all $x\in\F$. Note that we have combined like terms until the smaller polynomial ran out, then added terms from the higher-degree polynomial until it was also depleted.

For each $0\le i\le k$, define

$$c_i =
\begin{cases}
a_i + b_i & \text{if } i\le j, \\
b_i & \text{if } i>j.
\end{cases}$$

Then $c_0,\ldots,c_k\in\F$ are the coefficients of the polynomial $\p+\q$. That is,

$$(\p+\q)(x)=\sum_{i=0}^k c_i x^i.$$

If $c_k\ne 0$ then $\p+\q$ is a polynomial of degree $k$. If $c_k = 0$ then $\p+\q$ is a polynomial of degree less than $k$. Since $k\le n$, it follows either way that $\p+\q\in\P(\F)$.

Next, suppose that $a\in\F$ and $p\in\P(\F)$. Then there exists a degree $j\le n$ and coefficients $b_0,\ldots,b_j\in\F$ for which

$$\p(x)=\sum_{i=0}^j b_i x^i$$

for all $x\in\F$. Thus,

$$\begin{align}
a\p(x) &= a\sum_{i=0}^j b_i x^i \\
&= \sum_{i=0}^j ab_i x^i
\end{align}$$

for every $x\in\F$. If $a=0$ then clearly $a\p$ is the zero polynomial and thus $a\p=\0\in\P_n(\F)$. If $a\ne 0$ then for each $0\le i\le j$, define $c_i=ab_i$. Then $c_0,\ldots,c_j$ are the coefficients of $a\p$. That is,

$$a\p(x)=\sum_{i=0}^j c_i x^i.$$

Therefore, $a\p$ is a polynomial of degree $j\le n$, so $a\p\in\P(\F)$. We have established (rather painstakingly) that all three conditions hold, and so we conclude that $\P_n(\F)$ is in fact a subspace of $\P(\F)$.

We will now work toward defining sums of subspaces. We will begin with the following theorem, which establishes that the intersection of subspaces is always a subspace.

Theorem. If $U$ and $W$ are subspaces of a vector space $V$, their intersection $U\cap V$ is also a subspace of $V$.

Proof. We will again show that the conditions of the proposition hold. Since $U$ and $W$ are subspaces, we know that they both contain the zero vector and are closed under vector addition and scalar multiplication.

Since $\0\in U$ and $\0\in W$, certainly $\0\in U\cap W$.

For any vectors $\u,\v\in U\cap W$, we have that $\u,\v\in U$ and $\u,\v\in W$. Since both subspaces are closed under vector addition, $u+v\in U$ and $u+v\in W$. Thus, $\u+\v\in U\cap W$.

Lastly, suppose $a\in\F$ and $\v\in U\cap W$. Then $\v\in U$ and $\v\in W$, so $a\v\in U$ and $a\v\in W$ because both subspaces are closed under scalar multiplication. It follows that $a\v\in U\cap W$, completing the proof.

Using an inductive argument, it is easy to show that this result can be extended to any finite intersection of subspaces.

Naturally, we might now consider the question of whether the union of subspaces is a subspaces. A moment's thought will probably be enough to convince you that this is not true in general. The following example shows why.

Example. Consider the subspaces

$$\begin{align}
U &= \{(x,0)\in\R^2\mid x\in\R\}, \\
W &= \{(0,y)\in\R^2\mid y\in\R\}
\end{align}$$

of $\R^2$. We can picture $U$ as the space of all vectors lying on the horizontal axis of the cartesian plane, and $W$ as the space of all vectors lying on the vertical axis. The union of these subspaces, $U\cup W$, is not closed under vector addition. To illustrate this, notice that $(1,0)\in U$ and $(0,1)\in W$, and that $(1,0)+(0,1)=(1,1)$. However, $(1,1)\notin U$ and $(1,1)\notin W$, so certainly $(1,1)\notin U\cup W$. Therefore, $U\cup W$ is not a subspace of $\R^2$.

This is somewhat unfortunate behavior, because intersections and unions of sets are very natural and easy to work with. We would therefore like to define a way to combine subspaces in order to obtain a new subspace which is as close as possible to their union. We mean this in the sense that this new subspace should be the smallest subspace containing their union. To this end, we define the sum of subspaces in the following manner.

Definition. Let $U$ and $W$ be subspaces of a vector space $V$. The sum of $U$ and $W$ is the set

$$U+W=\{\u+\w\in V\mid\u\in U,\w\in W\}.$$

Similarly, the sum of subspaces $U_1,\ldots,U_n$ of $V$ is the set

$$\sum_{i=1}^n U_i = \{\u_1 + \cdots + u_n\in V\mid \u_1\in U_1,\ldots,\u_n\in U_n\}.$$

This definition certainly fixes the issue we had with unions of subspaces, because it ensures by its very definition that it contains the sum of any vector in $U$ and any vector in $W$. We should make sure that the sum of subspaces is actually a subspace.

Theorem. If $U$ and $W$ be subspaces of a vector space $V$, their sum $U+W$ is also a subspace of $V$.

Proof. Since $U$ and $W$ are subspaces, we know that they both contain the zero vector and are closed under vector addition and scalar multiplication.

Since $\0\in U$ and $\0\in W$, clearly $\0\in U+W$ because $\0=\0+\0$, i.e., it is the sum of a vector in $U$ and a vector in $W$.

Next, let $\v_1,\v_2\in U+W$. Then

$$\begin{align}
\v_1 &= \u_1+\w_1, \\
\v_2 &= \u_2+\w_2
\end{align}$$

for some vectors $\u_1,\u_2\in U$ and $\w_1,\w_2\in W$. Since $U$ and $W$ are closed under vector addition, we know that $\u_1+\u_2\in U$ and $\w_1+\w_2\in W$. Therefore,

$$\begin{align}
\v_1+\v_2 &= (\u_1+\w_1) + (\u_2+\w_2) \\
&= (\u_1+\u_2) + (\w_1+\w_2) \\
&\in U+W
\end{align}$$

since it is the sum of a vector in $U$ and a vector in $W$.

Finally, let $a\in\F$ and $\v\in U+W$, so that $\v=\u+\w$ for some $\u\in U$ and some $\w\in W$. Since $U$ and $W$ are closed under scalar multiplication, we know that $a\u\in U$ and $a\w\in W$. Thus,

$$\begin{align}
a\v &= a(\u+\w) \\
&= a\u + a\w \\
&\in U+W
\end{align}$$

since it is the sum of a vector in $U$ and a vector in $W$.

Now that we have established that the sum of two subspaces is a subspace, we will prove our assertion above — that the sum of two subspaces is the smallest subspace containing their union. We do this by showing that any subspace containing their union must also contain their sum.

Theorem. If $U_1$ and $U_2$ are subspaces of a vector space $V$, then every subspace containing $U_1\cup U_2$ also contains $U_1+U_2$.

Proof. Suppose $W$ is a subspace of $V$ for which $U_1\cup U_2\subseteq W$, and let $\v\in U_1+U_2$. Clearly $\v=\u_1+\u_2$ for some $\u_1\in U_1$ and some $\u_2\in U_2$, and so $\v\in W$ since $U_1\cup U_2\subseteq W$ and $W$ must be closed under vector addition since it is a subspace of $V$. Thus, $U_1+U_2\subseteq W$, completing the proof.

The above is equivalent to saying that $U_1+U_2$ is the intersection of all subspaces containing $U_1\cup U_2$. Again, this result extends to any finite number of subspaces via a simple inductive argument.

This verifies that the sum of subspaces acts as we originally intended, in that it is as close to their union as possible while remaining a subspace. Now that we have demonstrated this nice characterization of sums, I will provide an example before moving on.

Example. Consider the subspaces

$$\begin{align}
U_1 &= \{(x,0,0)\in\R^3\mid x\in\R\}, \\
U_2 &= \{(0,y,0)\in\R^3\mid y\in\R\}, \\
U_3 &= \{(0,0,z)\in\R^3\mid z\in\R\}
\end{align}$$

of the vector space $\R^3$. These correspond to the subspaces of vectors lying along the $x$, $y$ and $z$ axes, respectively. The sum of these three subspaces is $\R^3$ itself, because for any vector $(x,y,z)\in\R^3$, we may write

$$(x,y,z)=(x,0,0)+(0,y,0)+(0,0,z),$$

which is a sum of vectors in $U_1$, $U_2$ and $U_3$, respectively. This sum has a particularly nice property — namely that every vector in $\R^3$ has a unique representation as a sum of vectors in $U_1$, $U_2$ and $U_3$. Put another way, this sum is special because we are forced to express $(x,y,z)$ as above — there is no other way to break it down as such a sum. This is really why we choose to express points in $\R^3$ using these coordinates. In fact, this property of sums is so important that it has a name. But I will discuss direct sums next time, since it's very late and I'm a sleepyhead.

]]>