March 28, 2017

Set Theory

Contents

  1. What are sets?
  2. Basic facts about sets
  3. Operations on sets
  4. Functions

Everything in mathematics is built from sets.[1] Even objects such as functions and arithmetic operations like addition are formally defined in terms of sets, although you would likely never expect it. A consequence of this fact is that a basic familiarity with set theory is necessary in order to understand the topics I'll be discussing here. For this reason, I've decided to include this introductory post on the topic so that adventurous readers with little mathematical background will be able to understand my future posts without having to leave the confines of this blog.

What are sets?

Already we encounter something of a problem — one which took mathematicians quite a while to resolve, and should not be underestimated. The issue is that, because we wish to use set theory as the foundation of all future mathematical exploration, we don't actually have a place to start when defining sets themselves.

It takes quite a bit of logical sophistication to arrive at a truly satisfactory definition of a set. If you're interested in such things, I encourage you to look into any resource you can find on Zermelo-Fraenkel Set Theory with the Axiom of Choice (usually abbreviated ZFC), which is a more rigorous set of axioms which govern the behavior of sets.

For our purposes, the standard "naïve" set theory will do just fine. It is largely intuition-based, because it relies on the primitive notion that sets contain elements and elements are what make up sets. However, as long as we take certain precautions when dealing with sets, we should never encounter any problems stemming from this treatment:

Definition. A set is a collection of elements.

Basic facts about sets

Traditionally, capital letters like $A$, $B$ or $X$ are used to denote sets. If $x$ is an element of a set $X$, we write $x\in X$ and we say that "$x$ is in $X$."

Sets are determined solely by the elements they contain. For instance, if the numbers $1$, $2$ and $3$ are the only elements of the set $X$, we can indicate this by writing $X=\{1,2,3\}$. When we define sets in this manner, by specifying a complete list of elements they contain, we always sandwich the list of elements between curly braces, as above.

Definition. If two sets $A$ and $B$ contain precisely the same elements, we say that they are equal and we write $A=B$. If they are not equal, we write $A\neq B$.

If two sets are not equal, then by definition they do not contain the same elements. It follows then that one of them must contain at least one element that the other does not.

This definition of set equality should hopefully seem reasonable. Note that it does imply that the sets $\{x, y, z\}$ and $\{x, x, z, y, x, z\}$ are equal because they contain the same elements. Although some elements may appear more than once in the description of a set, or the elements may by written in a different order, a set either contains an element or it does not. Sets do not have any concept of multiplicity or order for their elements.

There is nothing to prevent sets from containing other sets, but be careful when dealing with sets that have other sets as elements. If $x$ is some element, then $\big\{x, \{x\}\big\} \neq \{x\}$ because the set $\{x\}$ is not the same thing as the element $x$.

It may be helpful to think of sets as bags, and their elements as objects inside these bags. If $x$ is an apple, then $\big\{x, \{x\}\big\}$ represents a bag inside of which are two things: an apple and a bag. Inside of the inner bag is also an apple. This is very clearly not the same thing as $\{x\}$, which is just a bag with only an apple inside.

Equality is not the only relationship between sets that we will be concerned with. The following definitions give us a way to talk about sets which contain some of the same elements.

Definition. Let $A$ and $B$ denote two sets. If every element of $A$ is also an element of $B$, we say that $A$ is a subset of $B$ and we write $A\subseteq B$.

Definition. If $A\subseteq B$ but $A\neq B$, we say that $A$ is a proper subset of $B$ and we write $A\subset B$.

Notice that, by these definitions, every set is a subset of itself. Also note that if $A\subset B$, it follows that $B$ contains some element that $A$ does not. Thus, if both sets are finite then $B$ is larger than $A$. Infinite sets are weird and do weird things, so we cannot necessarily make the same claim when they get involved.

Again, we must take care and realize that $A\subseteq B$ is a completely different statement than $A\in B$. To visualize this, we refer again to our analogy between sets and bags. Take as an example the case where $A$ is a bag containing only an apple, while $B$ is a bag containing an apple and an orange. It should be clear that $A$ is not actually an element of $B$, but it is a subset.

We're ready now for our first proof!

Theorem. Let $A$ and $B$ denote sets. If $A\subseteq B$ and $B\subseteq A$, then $A=B$.

Proof. Since $A$ is a subset of $B$, every element of $A$ is also an element of $B$. Likewise, since $B$ is a subset of $A$, every element of $B$ is also an element of $A$. It follows that the elements of each set are precisely the same, so $A=B$, as desired.

One important set will come up in our discussions quite often:

Definition. The empty set is the unique set containing no elements.

It is easy to see that the empty set is a subset of every set, since all of the elements it contains (none) are trivially also in every other set.

In many upcoming posts, I am going to assume familiarity with the following sets:

  • The set $\mathbb{N} = \{0, 1, 2, \dotsc\}$ of natural numbers.
  • The set $\mathbb{Z} = \{\dotsc, -1, 0, 1, \dotsc\}$ of integers.
  • The set $\mathbb{Q}$ of rational numbers whose elements are of the form $\frac{p}{q}$, where $p, q \in \mathbb{Z}$ and $q \neq 0$.
  • The set $\mathbb{R}$ of real numbers.
  • The set $\mathbb{C}$ of complex numbers whose elements are of the form $a+bi$, where $a, b \in \mathbb{R}$ and $i^2 = -1$.

It is convenient and natural to assume that $\mathbb{N}\subset\mathbb{Z}\subset\mathbb{Q}\subset\mathbb{R}\subset\mathbb{C}$, and I encourage you to think this way.[2]

It is often desirable to specify certain subsets of sets with which we are already acquainted in terms of some property that every element in the subset must possess. How would we, for instance, denote the set of positive real numbers? Or the set of even integers?

We can construct such subsets using set-builder notation. This is basically a fancy notation describing the form that elements in the set take, followed by a vertical bar which means "such that," and then a property which every element obeys — all sandwiched between the usual set brackets.

In this new notation, $\{x\in\mathbb{R} \mid x>0\}$ would be the set of positive real numbers. Similarly, $\{2n \mid n\in\mathbb{Z}\}$ is the set of even integers.

Operations of sets

Given two or more sets, we will often find the need to combine them in certain ways to create new sets. Below are four very common ways of creating new sets from old. Each of their definitions can be extended to combine any finite collection of sets, and sometimes even an infinite collection. Make certain that you are comfortable with each of these definitions, since they will show up everywhere ever.

Let $A$ and $B$ denote sets. Let's say, for illustration, that they're sets of points in the plane of your screen and they look like these blobs:

Definition. The union of $A$ and $B$ is the set of all elements in either $A$ or in $B$, or both. It is written $A\cup B$ and is defined as the set $\{x \mid x\in A \text{ or } x\in B\}$.

The union of the sets above would look like this:

Definition. The intersection of $A$ and $B$ is the set of all elements in both $A$ and $B$. It is written $A\cap B$ and is defined as the set $\{x \mid x\in A \text{ and } x\in B\}$.

Definition. Two sets are said to be disjoint if their intersection is empty.

The intersection of the above sets would look like this:

Definition. The complement (or difference) of $A$ in $B$ is the set of all elements that are in $B$ but not in $A$. It is written $B-A$ and is defined as the set $\{x\in B \mid x\notin A\}$.

For the sets above, the complement of $A$ in $B$ would look like this:

It is worth nothing that, unlike unions and intersections, complements depend on which set comes first. That is, $A-B\neq B-A$.

Definition. The Cartesian product of $A$ and $B$, written $A \times B$, is the set of all ordered pairs whose first component is in $A$ and whose second component is in $B$.

The Cartesian product of the above sets would be a bit trickier to draw since this would require four spacial dimensions. As a simpler example, suppose that $A$ and $B$ are both $\mathbb{R}$, the set of real numbers. Then $A\times B=\mathbb{R}\times\mathbb{R}$ and this set would consist of all points in the plane. This is because $\mathbb{R}\times\mathbb{R}$ by definition contains all pairs of the form $(x,y)$ where $x$ and $y$ are real numbers. This is why two-dimensional Euclidean space is often written as $\mathbb{R}^2$, as shorthand for $\mathbb{R}\times\mathbb{R}$.

It should be somewhat obvious how to take the union of an arbitrary collection of sets. For a finite collection of $n$ sets — $A_1, A_2, \dotsc, A_n$ — their union contains all of the elements that are in at least one of the sets:

$$\bigcup\limits_{i=1}^n A_i = \{x\mid x\in A_1 \text{ or } x\in A_2 \text{ or } \dotsc \text{ or } x\in A_n\}.$$

We can go even further, and take the union of any collection of sets. To do so, we first let $I$ be some indexing set such that $A_i$ is a set for every $i\in I$. Notice that there is nothing to stop $I$ from being an infinite set, in which case there would be infinitely many sets $A_i$ in our collection. We can take the union of all these sets as follows:

$$\bigcup\limits_{i\in I} A_i = \{x \mid x\in A_i \text{ for some } i\in I\}.$$

The same type of thinking allows for intersections and Cartesian products to be defined for more than two sets. In the case of Cartesian products, multiplying $n$ sets yields a set whose elements are ordered $n$-tuples, where the $i$th component is an element of the $i$th set being multiplied. Cartesian products of infinitely many sets are a bit too complicated to discuss here, although we can easily take the intersection of an arbitrary collection of sets, much as we did for their union:

$$\bigcap\limits_{i\in I} A_i = \{x \mid x\in A_i \text{ for every } i\in I\}.$$

There are two interesting ways in which the operation of set difference distributes over the operations of unions and intersections. These are called De Morgan's Laws:

$$X - (A \cup B) = (X - A) \cap (X - B),$$
$$X - (A \cap B) = (X - A) \cup (X - B).$$

These highly symmetric rules often turn out to be very useful, and I encourage you to draw some pictures to try to understand them. I will not prove them here. To illustrate the second rule, here is an example: if a ball is not both red and blue, then it is not red or it is not blue (or possibly both).

De Morgan's Laws also extend naturally to arbitrary collections of sets:

$$X - \bigcup\limits_{i\in I} A_i = \bigcap\limits_{i\in I} (X - A_i),$$
$$X - \bigcap\limits_{i\in I} A_i = \bigcup\limits_{i\in I} (X - A_i).$$

Functions

The last topic I'll talk about in this post is the concept of a function, as well as several important traits that functions may or may not possess. You've likely encountered functions before as things that eat variables and poo out numbers, and that's not a terrible way of thinking about them. We're going to make things a bit more formal though.

Definition. A function from a set $A$ to a set $B$, denoted $f:A\to B$, is a subset of $A\times B$ such that for each $x\in A$, there is exactly one $y\in B$ for which $(x, y)\in f$. In this definition, the set $A$ is called the domain of $f$ and $B$ is called the codomain.

Definition. The range (or image) of a function $f$ is the set $\text{im } f = \{y\in B \mid (x,y)\in f \text{ for some } x\in A\}$.

If you've never seen the above definition of a function before, it's natural to wonder at this point just what the crud I'm talking about. Why are we defining functions as sets? As I stated earlier, everything in mathematics is defined in terms of sets, and functions are no exception. We don't usually think of functions as sets of ordered pairs, but it is useful to define them this way so that we can make precise statements about them which stem from their definition. It also allows us to immediately infer what is meant by function equality, because it is inherited from our earlier definition of set equality.

So using this definition, a function is basically a list of $(x,y)$ values where $y$ is what you'd usually call $f(x)$. That is, the first component is the "input" to the function and the second component is the "output." The second condition in the definition is a rephrasing of what you may know as the "vertical line test." In two dimensions, a function from $\mathbb{R}$ to $\mathbb{R}$ is anything you can draw that exists above every value on the $x$-axis, but not more than once.

As I mentioned above, a more familiar way of rephrasing the last part of the definition might be this:

... for each $x\in A$, there exists exactly one $y\in B$ for which $y=f(x)$.

This notation is much more common, and I'll be using it almost exclusively from now on. Often, $f(x)$ is referred to as the image of $x$ under the function $f$. This is an unfortunate overloading of the word "image," but it will not lead to confusion if we are careful.

One of the nice things about our definition of a function it that it allows us to define functions between any two sets we want, and the values of the function don't have to obey any particular rule. We can simply check each point and its image to make sure they obey the definition.

If you're a bit confused by all this, just sit and absorb it for a minute. Realize that ordinary things like the function $f:\mathbb{R}\to\mathbb{R}$, defined by $f(x)=x^2$ for every $x\in\mathbb{R}$, still satisfy our new definition. Think everything through — maybe draw some pictures — and then continue reading.

Now, notice that something like $f:\mathbb{R}\to\mathbb{R}$ defined by $f(x)=\frac{1}{x}$ for all $x\in\mathbb{R}$ is not actually a function! This is because not every element of the domain gets mapped to a real number. In particular, there is no $y\in\mathbb{R}$ for which $y=f(0)$. We would need to restrict the domain to $\mathbb{R} - \{0\}$ to turn this thing into a function.

The next thing we'll do is introduce some common classes of functions.

Definition. A function $f:A\to B$ is injective if for every $y\in B$ there exists at most one $x\in A$ for which $y=f(x)$.

Basically, an injective function cannot have two points in its domain get mapped to the same point in its codomain. We can check that a function is injective by showing that if $x\neq y$ then $f(x)\neq f(y)$. Equivalently, we can do this by instead showing that if $f(x)=f(y)$ then $x=y$. (These two statements are contrapositives so they are logically equivalent, but sometimes one will be easier to show than the other.) Injective functions in the plane satisfy what might constitute a "horizontal line test," in that any horizontal line will intersect their graph at most once.

Definition. A function $f:A\to B$ is surjective if for every $y\in B$, there exists at least one $x\in A$ for which $y=f(x)$.

A surjective function has every point in its codomain get mapped to by some point in the domain. That is, every element of the codomain is the image of some element in the domain. In this case, the codomain of the function is equal to its range. Notice that we can turn any function into a surjective function by restricting its codomain to its range.

Definition. A function $f:A\to B$ is bijective if for every $y\in B$, there exists exactly one $x\in A$ for which $y=f(x)$. Equivalently, a function is bijective if it is both injective and surjective.

Bijective functions are particularly nice because they have inverse functions. That is, if $f:A\to B$ is bijective, we can find a function $g:B\to A$ with the property that $f\big(g(y)\big)=y$ for every $y\in B$ and $g\big(f(x)\big)=x$ for every $x\in A$.

A function's inverse essentially allows us to undo what that function did in the first place. It turns out that if a function has an inverse, then that inverse is unique and thus we can unambiguously write the inverse of $f$ as $f^{-1}$.

Functions that aren't bijective do not have inverses, but we can still define something similar:

Definition. The preimage of a set $Y\subseteq B$ under a function $f:A\to B$ is the set of all $X\in A$ for which $f(x)\in Y$. We write $f^{-1}[Y]=\{x\in A\mid f(x)\in Y\}$.

Note that $f^{-1}$ here does not denote an inverse function! We haven't even defined the preimage as a function at all (yet). The square brackets around its argument (which is always a set) help us to distinguish between whether we are talking about inverse functions or preimages. Given a subset of the codomain, the preimage tells us which elements in the domain get mapped into that set.

Definition. The power set of a set $X$ is the set of all subsets of $X$.

We often denote the power set of $X$ as $2^X$ because a simple combinatorial argument shows that if $X$ is a finite set with $n$ elements, then $X$ has $2^n$ distinct subsets.

Preimages are technically functions themselves, though perhaps not in the way you might expect. Since preimages take subsets of a function's codomain and map them to subsets of a function's domain, the preimage of a function $f:X\to Y$ is a set-valued function $f^{-1}:2^Y\to 2^X$ defined by $f^{-1}[U]=\{x\in X\mid f(x)\in U\}$ for every $U\in 2^Y$.

Now the overloading of the symbol $f^{-1}$ is even worse, because we have defined both inverses and preimages as functions! However, careful use of square brackets for the preimage and the fact that one function is set-valued while the other is usually not will help us to avoid confusion. Further, inverse functions only exist for bijections, but preimages are defined for all functions.

Definition. Given two functions $f:A\to B$ and $g:B\to C$ we can form a new function $g\circ f:A\to C$, called the composition of $f$ and $g$, by defining $(g\circ f)(x)=g\big(f(x)\big)$ for every $x\in A$.

Intuitively, when taking the composition of $f$ and $g$, we first take $f(x)$ and then feed the result into $g$. Note that the composition of functions is only defined when the first function's codomain is equal to the second function's domain.

Definition. For any set $X$, we can define an identity function $1_X:X\to X$ by $1_X(x)=x$ for every $x\in X$.

An identity function maps every element of a set to itself, so essentially it does nothing at all. If no confusion can arise about which set we're talking about, we sometimes drop the subscript and write $1$ instead of $1_X$ to denote this function.

Notice now that we can phrase the definition of an inverse function more compactly: A function $f:A\to B$ has an inverse $g:B\to A$ if $f\circ g=1_B$ and $g\circ f=1_A$. That is, the two possible compositions of $f$ and $g$ are equal to the identity functions on their respective domains.

One final remark I will make is that the composition of functions is always associative. That is, if we have three functions $f:A\to B$, $g:B\to C$ and $h:C\to D$, then $(h\circ g)\circ f=h\circ(g\circ f)$. This may look obvious, but the proof — while straightforward — might elude you if you are not accustomed to writing proofs. Nonetheless, I encourage you to try showing this fact for yourself. Here's a hint: use the definition of function composition. If you've done the proof right, it will probably feel as though you haven't really done anything at all.


  1. One sentence in and I'm already lying to you. There are alternative formulations of mathematics, most notably via topos theory or type theory. If you are a beginner, I would recommend not looking these things up or you will likely be very confused. Set theory is by far the simplest foundation to start from, which explains why it is the most common approach (and the one I will take here). ↩︎

  2. More formally it happens that each set, equipped with a certain algebraic structure, is isomorphic to a subobject of the next set, equipped with its own structure. If I ever decide to write a post (or five) detailing the constructions of each of the aforementioned sets, you'll see this idea in excruciating detail. ↩︎