November 12, 2019

Permutations and Parity

In my next post, I would like to introduce a very special type of tensor whose properties are invaluable in many fields, most notably differential geometry. Although it's possible to understand antisymmetric tensors without discussing permutations and their parity, these concepts are invaluable to developing the theory properly. Thus, in this post we will take a brief voyage into the topic of permutations.

Definition. A permutation on a finite set $X$ is a bijective function $p:X\to X$.

That seems simple enough. Basically, permutations offer us a way of rearranging the order of elements in a set. To see what I mean by this, let's look at a few examples.

Example. Let $\N_4$ denote the set $\{0, 1, 2, 3\}$ containing the first four natural numbers. The function $p_1:\N_4 \to \N_4$, defined by $p_1(x) = x + 1\hspace{-4pt}\mod 4$, is a permutation. See how the permutation $p_1$ rearranges the order of these numbers:

$$
\begin{align}
0 &\mapsto 1 \\
1 &\mapsto 2 \\
2 &\mapsto 3 \\
3 &\mapsto 0.
\end{align}
$$

Example. Not every permutation is as easy to write an equation to describe. For example, let $X$ denote the set $\{a, b, c, d, e, f, g\}$ . Then there exists a bijection $p_2$ which permutes these letters as follows:

$$
\begin{align}
a &\mapsto a \\
b &\mapsto c \\
c &\mapsto b \\
d &\mapsto e \\
e &\mapsto g \\
f &\mapsto d \\
g &\mapsto f.
\end{align}
$$

However, there's no obvious way to describe this permutation other than to simply write out what happens to each individual letter.

Because we often need to describe permutations that have no simple corresponding equation, it is common to adopt the following compact notation to represent permutations:

Notation. Let $p:X\to X$ denote a permutation on some finite set $X=\{x_0, x_1, x_2, \ldots, x_n\}$. Then we can write

$$
p = \begin{pmatrix}
x_0 & x_1 & x_2 & \ldots & x_n \\
p(x_0) & p(x_1) & p(x_2) & \ldots & p(x_n)
\end{pmatrix}
$$

as a shorthand to describe the images of each element under $p$.

Using this new notation, the permutation from the first example above can be written

$$
p_1 = \begin{pmatrix}
0 & 1 & 2 & 3 \\
1 & 2 & 3 & 0
\end{pmatrix},
$$

and the one from the second example becomes

$$
p_2 = \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & e & g & d & f
\end{pmatrix}.
$$

This second example has a few interesting properties. Note that $a$ gets sent to $a$ and is thus a fixed point of the permutation. The letters $b$ and $c$ get swapped, and $d, e, f$ and $g$ all get permuted among themselves. What this means is that the action of $a$ is entirely unrelated to the actions of $b$ and $c$, which are entirely unrelated to the actions of $d, e, f$ and $g$.

This observation that certain subsets behave completely independently of each other suggests that there may be a way to break the permutation into more fundamental building blocks.

Before we delve into how this is done, it is important to realize that, since permutations are just functions, they can be composed with each other. The following easily-proved result tells us that the composition of permutations is always itself a permutation.

Theorem. Let $p:A\to B$ and $q:B\to C$ be bijective functions. Then $q\circ p:A\to C$ is also bijective.

Proof. Since $p$ and $q$ are bijections, they are by definition injective and surjective. To show that $q\circ p$ is bijective, we must show that it is also injective and surjective.

We argue first that $p\circ q$ is injective. To this end, suppose that $a_1, a_2\in A$ and that $a_1\ne a_2$. Then because $p$ is injective, we know that $p(a_1)\ne p(a_2)$. Furthermore, since $q$ is injective, we know that

$$
\begin{align}
(q\circ p)(a_1) &= q(p(a_1)) \\
&\ne q(p(a_2)) \\
&= (q\circ p)(a_2).
\end{align}
$$

We have shown that distinct elements of $A$ get mapped to distinct elements of $C$, so it follows that $q\circ p$ is injective.

It remains to be shown that $q\circ p$ is surjective. For any element $c\in C$, the surjectivity of $q$ tells us that there exists some element $b\in B$ for $c = q(b)$. It follows then from the surjectivity of $p$ that there exists some $a\in A$ for which $b=p(a)$. That is,

$$
\begin{align}
c &= q(p(a)) \\
&= (q\circ p)(a).
\end{align}
$$

We have shown that every element of $C$ gets mapped to by some element of $A$, so it follows that $q\circ p$ is surjective.

Since $q\circ p$ is injective and surjective, it is bijective.

This theorem is actually much more general than what we need, since permutations are very specific bijections which take finite sets to themselves. But as a special case, it certainly tells us that the composition of permutations is a permutation.

Notation. For convenience, we frequently omit the $\circ$ symbol and treat composition of permutations similarly to multiplication. That is, we simply write $qp$ instead of $q\circ p$. In particular, this allows us to use exponential notation to write

$$
p^n = \underbrace{p\circ p\circ \cdots \circ p}_{n \text{ times}}.
$$

We also define $p^0$ to be the identity function, and $p^{-n}$ to be $(p^{-1})^n$. Note that $p^{-1}$ exists and is guaranteed to be a permutation since $p$ is bijective. With these definitions, it is easy to see that, for any integers $n$ and $m$,

$$
p^{n+m} = p^n p^m.
$$

Equipped with our new machinery, we can begin to see how a permutation can be broken up into more basic building blocks.

Example. Consider the permutations

$$
q_1 = \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & d & e & f & g
\end{pmatrix}
$$

and

$$
q_2 = \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & e & g & d & f
\end{pmatrix}.
$$

Then their composition is

$$
\begin{align}
q_2 \circ q_1 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & e & g & d & f
\end{pmatrix} \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & d & e & f & g
\end{pmatrix} \\
&= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & e & g & d & f
\end{pmatrix} \\
&= p_2.
\end{align}
$$

This gives us an explicit way to express the permutation $p_2$ as the composition of a permutation $q_1$, which affects only the elements $b$ and $c$, and the permutation $q_2$, which affects only the elements $d, e, f, g$.

It is generally possible to break up any permutation this way. We are now ready to see how it is done.

Theorem. Let $p:X\to X$ be a permutation. Then the relation $\sim$ on $X$, defined by $a\sim b$ whenever $b=p^n(a)$ for some $n\in\Z$, is an equivalence relation.

Proof. By the definition of equivalence relation, it suffices to show that $\sim$ is reflexive, symmetric and transitive.

To see that $\sim$ is reflexive, it suffices to choose $n=0$, as it is certainly true that $a = p^0(a)$ for any $a\in X$. (Recall that $p^0$ is the identity map.) Thus, $a\sim a$ for every $a\in X$.

To see that it is symmetric, suppose $a\sim b$ for some elements $a,b\in X$. Then by definition, $b=p^n(a)$ for some integer $n$. Acting on both sides with the permutation $p^{-n}$ yields

$$
\begin{align}
p^{-n}(b) &= p^{-n}(p^n(a)) \\
&= p^0(a) \\
&= a.
\end{align}
$$

It follows then that $b\sim a$ and thus $\sim$ is symmetric.

Lastly, we need to show that $\sim$ is transitive. To this end, suppose that $a\sim b$ and $b\sim c$, for some elements $a, b, c\in X$. Then there exist integers $n_1, n_2$ for which $b=p^{n_1}(a)$ and $c=p^{n_2}(b)$. Then

$$
\begin{align}
c &= p^{n_2}(b) \\
&= p^{n_2}(p^{n_1}(a)) \\
&= (p^{n_2}p^{n_1})(a) \\
&= p^{n_1 + n_2}(a).
\end{align}
$$

Since $n_1+n_2$ is an integer, it follows that $c\sim a$ and thus $\sim$ is transitive.

We have shown that $\sim$ is reflexive, symmetric and transitive. Therefore it is an equivalence relation.

The following definition makes this all easier to talk about.

Definition. Let $p:X\to X$ denote a permutation on a finite set $X$, and let $\sim$ be the equivalence relation on $X$ defined in the previous theorem. Then the equivalence classes of $\sim$ are called orbits of the permutation $p$.

Let's break this all down. Basically an orbit of a permutation is a collection of elements that are all reachable from each other under repeat application of that permutation. That is, if $x$ and $y$ are in the same orbit of some permutation, then applying the permutation to $x$ enough times will eventually get you to $y$.

In the case of the permutation $p_2$ defined earlier in the post, the orbits are the sets $\{a\}$, $\{b, c\}$ and $\{d, e, f, g\}$. Our earlier assertion that these sets behaved independently of each other under the permutation $p_2$ is now formalized by the realization that orbits must partition the entire set. But remember that we were actually able to use this fact to find permutations $q_1$ and $q_2$ which acted on these orbits and whose composition was $p_2$. Let's see if we can make this type of decomposition more rigorous as well.

Definition. A permutation $p:X\to X$ is a cycle if at most one of its orbits contains more than a single element. Two cycles are called disjoint if their non-singleton orbits are disjoint sets. The length of a cycle is the size of its largest orbit.

It is not difficult to see that $q_1$ and $q_2$ are cycles. The orbits of $q_1$ are $\{a\}$, $\{b, c\}$, $\{d\}$, $\{e\}$, $\{f\}$ and $\{g\}$. Note that only one of these orbits, namely $\{b,c\}$, has upwards of one element, and so $q_1$ is a cycle. Similarly, the orbits of $q_2$ are $\{a\}$, $\{b\}$, $\{c\}$ and $\{d, e, f, g\}$. Since only this last orbit has more than one element, $q_2$ is a cycle as well. Additionally, the cycles $q_1$ and $q_2$ are disjoint because their non-singleton orbits are disjoint.

This implies that what we are really looking for is a way to decompose any permutation into a product of disjoint cycles. The next theorem tells us that this can always be done.

Theorem. Any permutation $p:X\to X$ can be written as the composition of disjoint cycles.

Proof. Let $O_1, O_2, \ldots, O_n$ denote the orbits of $p$. (There are, of course, finitely many orbits because $X$ must be a finite set.) Then for each $1\le i \le n$ and each $x\in X$, define

$$
c_i(x) = \begin{cases}
p(x) & \text{if $x\in O_i$}, \\
x & \text{otherwise}.
\end{cases}
$$

Then each $c_i$ is a cycle because, by construction, its only non-singleton orbit is $O_i$. Since the orbits are disjoint, this in turn means that these are all disjoint cycles.

We argue that $p=c_1c_2\cdots c_n$. To this end, choose any $a\in X$. Then at most one orbit of $p$, say $O_j$, contains $a$. This means that $c_j(a)=p(a)$ and $c_i(a) = a$ for every $i\ne j$. That is, the restriction $c_i\mid_{\{a\}}$is the identity map $i$. It follows then that

$$
\begin{align}
(c_1 c_2 \cdots c_j \cdots c_n)(a) &= \big(c_1\mid_{\{a\}} c_2\mid_{\{a\}} \cdots c_j \cdots c_n\mid_{\{a\}}\big)(a) \\
&= \big(i^{j-1} c_j i^{n-j}\big)(a) \\
&= c_j(a) \\
&= p(a),
\end{align}
$$

as desired.

This type of decomposition is even unique! That is, there is one such collection of disjoint cycles whose product is our desired permutation. The order in which they are multiplied is not unique, however.

Disjoint cycles have another nice property: they commute, meaning that it does not matter in which order we apply them.

Theorem. If $p:X\to X$ and $q:X\to X$ are disjoint cycles, then $pq=qp$.

Proof. Since $p$ and $q$ are disjoint, then their non-singleton orbits, $O_p=\{a_0, a_2, \ldots, a_{n-1}\}$ and $O_q=\{b_0, b_2, \ldots, b_{m-1}\}$ are disjoint. Let $C$ be the set of elements in $X$ which are fixed points of both $p$ and $q$. Then $X=O_p \cup O_q \cup C$ and these sets are all pairwise disjoint. It thus suffices to show that $(pq)(x)=(qp)(x)$ for any of these cases: $x\in O_p$, $x\in O_q$ and $x\in C$.

Suppose first that $x\in O_p$. Then $x=a_i$ for some $0\le i< n$. Note that $p(a_i)=a_{i+1\hspace{-2pt}\mod n}$ and $q(a_i)=a_i$ since $a_i$ is by definition in a singleton orbit of $q$. Thus,

$$
\begin{align}
(pq)(x) &= (pq)(a_i) \\
&= p(q(a_i)) \\
&= p(a_i) \\
&= a_{i+1\hspace{-2pt}\mod n} \\
&= q(a_{i+1\hspace{-2pt}\mod n}) \\
&= q(p(a_i)) \\
&= (qp)(a_i).
\end{align}
$$

An entirely symmetrical argument holds for the case where $x\in O_q$, noting that $x=b_j$ for some $0\le j< m$ and that $q(b_j)=b_{j+1\hspace{-2pt}\mod m}$ and $p(b_j) = b_j$ since $b_j$ is a fixed point of $p$.

Lastly, if $x\in C$ then it is a fixed point of both $p$ and $q$. That is, $p(x) = q(x) = x$. Thus,

$$
\begin{align}
(pq)(x) &= p(q(x)) \\
&= p(x) \\
&= x \\
&= q(x) \\
&= q(p(x)) \\
&= (qp)(x).
\end{align}
$$

It follows that $(pq)(x)=(qp)(x)$ for every $x\in X$, and thus $pq=qp$ as desired.

If we do not demand that our cycles be disjoint, there are even more ways to decompose a permutation into products of cycles. As an example, consider the following definitions:

$$
\begin{align}
r_1 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & c & b & d & e & f & g
\end{pmatrix}, \\
r_2 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & e & d & f & g
\end{pmatrix}, \\
r_3 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & d & g & f & e
\end{pmatrix}, \\
r_4 &= \begin{pmatrix}
a & b & c & d & e & f & g \\
a & b & c & d & e & g & f
\end{pmatrix}.
\end{align}
$$

These are all permutations which each swap only two elements. They are cycles because each has only one non-singleton orbit. Those orbits are $\{b, c\}$, $\{d, e\}$, $\{e, g\}$ and $\{f, g\}$, respectively. These orbits are not disjoint, and thus neither are are the cycles. But it turns out that $r_4r_3r_2r_1=p_2$. To see this, the following flowchart might be a helpful visual aid, where we apply first $r_1$, then $r_2$, then $r_3$ and finally $r_4$ from top to bottom:

$$
\begin{matrix}
a & b & c & d & e & f & g \\
&&& \downarrow \\
a & c & b & d & e & f & g \\
&&& \downarrow \\
a & c & b & e & d & f & g \\
&&& \downarrow \\
a & c & b & e & g & f & d \\
&&& \downarrow \\
a & c & b & e & g & d & f
\end{matrix}
$$

Since we've now shown two ways of decomposing $p_2$ into cycles, it's obvious that such a decomposition is not unique unless we insist that our cycles are disjoint. There is still useful information to be extracted from such decompositions, however, particularly decompositions of the form above.

Definition. A transposition is a cycle of length $2$.

In other words, a transposition is a permutation which swaps only two elements. This means, for example, that $r_1, r_2, r_3$ and $r_4$ as defined above are all transpositions. The following theorem, though perhaps obvious, is surprisingly important. Before we prove it, it is helpful to introduce a new notation for cycles:

Notation. Let $c:X\to X$ denote a cycle. If its largest orbit sends $x_1 \mapsto x_2 \mapsto \cdots \mapsto x_n$ then we write

$$
c = \begin{pmatrix}
x_1 & x_2 & \cdots & x_n
\end{pmatrix}.
$$

This allows us to simplify our representations a great deal, as long as we know the permutations we're talking about are all cycles. For instance, it allows us to write

$$
\begin{align}
q_1 &= \begin{pmatrix}b & c\end{pmatrix}, \\
q_2 &= \begin{pmatrix}d & e & g & f\end{pmatrix}, \\
r_1 &= \begin{pmatrix}b & c\end{pmatrix}, \\
r_2 &= \begin{pmatrix}d & e\end{pmatrix}, \\
r_3 &= \begin{pmatrix}e & g\end{pmatrix}, \\
r_4 &= \begin{pmatrix}f & g\end{pmatrix}.
\end{align}
$$

Thus,

$$
\begin{align}
p_2 &= \begin{pmatrix}d & e & g & f\end{pmatrix}\begin{pmatrix}b & c\end{pmatrix} \\
&= \begin{pmatrix}f & g\end{pmatrix}\begin{pmatrix}e & g\end{pmatrix}\begin{pmatrix}d & e\end{pmatrix}\begin{pmatrix}b & c\end{pmatrix}.
\end{align}
$$

Now we're ready to prove that theorem!

Theorem. Any permutation $p:X\to X$, where $X$ is a finite set containing at least two elements, can be expressed as a product of transpositions.

Proof. If $p$ is the identity permutation, then the empty product yields p. Otherwise, we can write $p$ as a finite product of cycles, $p=c_1c_2\cdots c_n$. If we can decompose each cycle $c_i$ into a product of transpositions, the result will follow.

Each cycle $c_i$ can be written $c_i=\begin{pmatrix} a_1 & a_2 & \cdots & a_{m_i} \end{pmatrix}$ for some integer $m_i$ and some $\{a_j\}_{j=1}^{m_i}$. We write this cycle as a composition of transpositions as follows

$$
\begin{align}
c_i &= \begin{pmatrix} a_1 & a_2 & \cdots & a_{m_i} \end{pmatrix} \\
&= \begin{pmatrix}a_1 & a_{m_i}\end{pmatrix}\begin{pmatrix}a_1 & a_{m_i-1}\end{pmatrix}\cdots\begin{pmatrix}a_1 & a_2\end{pmatrix}.
\end{align}
$$

Now $p$ is just the product of these cycles, each of which is the product of transpositions, and so $p$ itself is a product of transpositions, completing the proof.

There are many possible such decompositions into transpositions for any given permutation. However, it turns out that such decompositions have one important invariant property. Namely, any permutation's expression as a product of transpositions will always either have an odd number of transpositions or an even number.

For instance, there may be many ways to decompose $p_2$ into a product of transpositions. But we've seen it done using four, and so we can say with certainty that any such decomposition will always have an even number of transpositions. This property is called the parity of a permutation, and it is so important that there is a theorem named after it — the Parity Theorem. Unfortunately the proof is somewhat beyond the scope of this blog post, but maybe I will devote a full post to it at some point. In any case, you should now have a good understanding of what it means. Here is the statement:

Parity Theorem. A permutation $p$ cannot be expressed as both the product of an odd number of transpositions and as the product of an even number of transpositions.

This fact allows us make the following definitions:

Definition. A permutation is odd if it can be expressed as the product of an odd number of transpositions. It is even if it can be expressed as the product of an even number of transpositions.

Definition. If a permutation $\sigma$ is odd, we define its sign to be $\sgn\sigma = -1$. If it is even, its sign is $\sgn\sigma = 1$.

An alternative notation for $\sgn\sigma$ is $(-1)^\sigma$, since the sign is equal to $(-1)^m$ where $m$ is the number of transpositions in the decomposition of $\sigma$.

Every permutation is either even or odd, but it cannot be both.

Lastly, it is important to point out that the set of all permutations on a finite set $X$ forms a group called the symmetric group of $X$, written $\text{Sym}(X)$ or $S_n$, where $n$ is the number of elements in $X$. The permutations are the elements of this group, and the group operation is function composition. Since function composition is associative and permutations have inverses, this clearly does in fact form a group.

I'll leave this here for now, and we will soon put all of this to use!