A Note on Random Variables
$\newcommand{\triple}{(\Omega, \mathcal{F}, \mathbf{P})}$$\newcommand{\P}{\mathbf{P}}$ This note on random variables follows as a result of confusing notation in several math textbooks. I’ll explain random variables (in measure theoretic terms) as verbosely as I can, and then prove some results. This article assumes that the reader is familiar with probability triples $\triple$, as well as a basic idea of what random variables are, in non-measure theory terms.
1. Random Variable Prerequisites
We start with defining measurable spaces and measurable functions
Definition 1.1: A Measurable space $(X,\Sigma)$ consists of a set $X$ and a $\sigma$-algebra $\Sigma$ defined on $X$.
Definition 1.2: A Generated $\sigma$-algebra is the smallest $\sigma$-algebra containing a specified collection of sets. That is, if $A$ is a set of subsets of $X$, $\sigma(A)$ is the smallest sigma-algebra such that $A \subseteq \sigma(A)$.
Definition 1.3: A Measurable Function $f: (X,\Sigma) \to (Y,\Gamma)$ between two measurable spaces is a function such that for every $E \in \Gamma$, $\{x \in X\ |\ f(x) \in E\} \in \Sigma$.
Since the definition $\{x \in X\ |\ f(x) \in E\}$ is used so commonly in the context of measurable functions, this has a special notation
Definition 1.4: $$f^{-1}(E) := \{x \in X\ |\ f(x) \in E\}$$
NOTE: The above definition is confusing, but is unfortunately the norm when dealing with measurable functions. In the context of measurable functions, $f^{-1}$ does not refer to the inverse of $f$ (which is a function from $Y \to X$), but rather the set of preimages of all the elements contained in a set in the sigma algebra.
Measurable functions can also be defined in terms of the $\sigma$-algebra generated by a function, rather than that of a set
Definition 1.5: The $\sigma$-algebra generated by a function $f: (X,\Sigma) \to (Y, \Gamma)$ is the collection of all inverse images $f^{-1}(S),\ S \in \Gamma$. $$\sigma(f) := \{ f^{-1}(S) : S \in \Gamma \}$$
According to this definition, if $\sigma(f) \subseteq \Sigma$, then $f$ is a measurable function.
2. Random Variables
Random variables are unfortunately, neither random nor variables. This is the first of many misnomers that we encounter in their study.
Definition 2.1: A Random Variable $X$ defined on a probability triple $\triple$ is a measurable function $X : (\Omega, \mathcal{F}) \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$
In it’s simplest terms, A random variable is simply a function from $\Omega \to \mathbb{R}$, obeying some ’nice’ rules which allow us to use probability measures with it. These nice rules would come into play a bit later, after we first see how random variables and probability measures go hand in hand.
Consider $$\begin{align}\Omega &= \{1,2,3\}\\ \mathcal{F} &= 2^{\Omega} \\ \mathbf{P}&:\mathcal{F} \to [0,1]\end{align}$$ such that $\mathbf{P}\{1\} = \mathbf{P}\{2\} = \mathbf{P}\{3\}$ (This is the discrete uniform probability space on $\{1,2,3\}$). Let our random variable $X: \Omega \to \mathbb{R}$ map $\{i\}$ to $i$, $i \in \{1,2,3\}$. A graphical depiction of this would look something like this:
Now, suppose we had to calculate the probability that the random variable $X$ would be less than or equal to $2.5$. The probability of this event occuring is given by $\P\{\omega \in \Omega\ :\ X(\omega) \le 2.5\}$. From the inverse notation we developed in $\S$1.4, We can also write this as $\P(X^{-1}((-\infty, 2.5]))$. From the graph, we clearly see that $1$ and $2$ are the only elements in $\Omega$ that would be in this set, hence $\P\{1,2\} = 2/3$. This is how random variables and probability measures go hand in hand.
Why then, do random variables need to be measurable functions? Note that the probability measure is only defined for sets in $\mathcal{F}$, and if $X$ is not measurable, we cannot find the probability of certain events associated with $X$.
An example for this is to consider $$\begin{align}\Omega &= \{1,2,3\}\\ \mathcal{F’} &= \{\emptyset, \{1\}, \{2,3\}, \Omega\} \\ \mathbf{P}&:\mathcal{F’} \to [0,1]\end{align}$$ such that $\P\{1\} = 1/3$. Now consider $X’: \Omega \to \mathbb{R}$ such that $X’(i) = i,\ i \in \Omega$. This is the same map as before. However, if we try to calculate the probability that $X$ is less than or equal to 2.5 now, we find that $\P\{1,2\}$ is undefined, as $\{1,2\} \not\in \mathcal{F’}$. Hence, $X’$ is not a random variable, as it is not measurable on $(\Omega, \mathcal{F’})$. More specifically, $\sigma(X) = 2^\Omega \not\subseteq \mathcal{F’}$, hence, $X’$ is not measurable
3. Results on Random Variables
Claim 3.1: If $X: \Omega \to \mathbb(R)$ is a random variable on $\triple$, then $X^{-1}(B) = A \implies X^{-1}(B^C) = A^C$.
A simple (maybe even obvious) claim, the proof is by definition: $$\begin{align} X^{-1}(B) &= A = \{\omega \in \Omega\ :\ X(\omega) \in B\} \\ \implies X^{-1}(B^C) &= \{\omega \in \Omega\ :\ X(\omega) \not\in B\} = A^C \end{align}$$
Claim 3.2: If $X = \mathbf{1}_A$ is the indicator of some event $A \in \mathcal{F}$, then $X$ is a random variable
Proof: for all $B \in \mathcal{B}(\mathbb{R})$, we have $X(B)$ equal to any one of $A$ (if $B$ contains 1 and not 0), $A^C$ (if $B$ contains 0 and not 1), $\emptyset$ (if $B$ contains neither 0 nor 1) or $\Omega$ (if $B$ contains both 0 and 1). Hence, $X$ is a random variable.
The next two claims would be key to proving results about functions of random variables
Claim 3.3: if $f: (\Omega_1, \mathcal{F}_1) \to (\Omega_2, \mathcal{F}_2)$ and $g: (\Omega_2, \mathcal{F}_2) \to (\Omega_3, \mathcal{F}_3)$ are two measurable functions, then $f \circ g : (\Omega_1, \mathcal{F}_1) \to (\Omega_3, \mathcal{F}_3)$ is also a measurable function
Proof: For all $B \in \mathcal{F}_3$, since $g$ is measurable, $g^{-1}(B) \in \mathcal{F}_2$. Since $f$ is measurable, $f^{-1}(g^{-1}(B)) \in \mathcal{F}_1$. Hence, $f\circ g$ is measurable.
Claim 3.4: $f: (\Omega_1, \mathcal{F}_1) \to (\Omega_2, \sigma(C))$ is measurable if $A \in C \implies f^{-1}(A) \in \mathcal{F}_1$.
Proof: Note that $f^{-1}(\Omega_2 \setminus A) = \Omega_2 \setminus f^{-1}(A)$, and $f^{-1}(\cup_n A_n) = \cup_n f^{-1}(A_n)$. This, along with the fact that $\mathcal{F}_1$ is a $\sigma$-algebra proves that $\{A : f^{-1}(A) \in \mathcal{F}_1\}$ is a $\sigma$-algebra containing $C$. Since $\sigma(C)$ is the smallest $\sigma$-algbra containing C, $\sigma(C)$ would be a subset of the above $\sigma$-algebra, hence the claim is true.
This above claim ensures that we don’t need to prove that every set of a $\sigma$-algebra has a preimage in the previous $\sigma$-algebra. Proving it for only the generating set is enough eg. for $\mathcal{B}(\mathbb{R})$, it’s sufficient to show that only the open sets have a preimage, something that we’ll use in the next proof.
Claim 3.5: Every continuous function $f: \mathbb{R} \to \mathbb{R}$ is measurable.
Proof: from (3.4), it’s sufficient to prove that for every open set $A$, $f^{-1}(A) \in \mathcal{B}(\mathbb{R})$. This follows from the continuity of $f$: $f$ is continuous iff $G$ is open implies that $f^{-1}(G)$ is also open. Hence, $f$ is measurable.
The above three claims give us the following very powerful result: every continuous function of a random variable is also a random variable. We can make a stronger claim, after proving the following claims as well:
Claim 3.6: If $X$ and $Y$ are random variables on $\triple$, then $X+Y$ and $XY$ are random variables as well
Proof: This cute proof comes from Rosenthal. It’s sufficient to prove that $X+Y$ is a random variable on the collection of sets $(-\infty, x)$, as the generated $\sigma$-algebra of this collection is $\mathcal{B}(\mathbb{R})$. Hence, consider the set $\{\omega \in \Omega : X(\omega) + Y(\omega) < x\}$. From the density theorem, we can find a rational number in $(X, x-Y)$ (I’ve dropped the $(\omega)$, as it’s implicit here). hence, $$\{X + Y < x\} = \bigcup_{\text{r rational}} (\{X \lt r\} \cap \{Y \lt x - r\})$$ Since all the elements in the union belong to $\mathcal{F}$ and since $\mathcal{F}$ is a $\sigma$-algebra, $X+Y$ is a random variable.
XY is also a random variable, as $XY = [(X+Y)^2 - (X^2+Y^2)]/2$, and a sum/function of random variables is a random variable, from the previous claims.
We are now free to extend the claim that every continuous function of a random variable is a random variable, to piecewise continuity: every piecewise continuous function of a random variable is also a random variable. If $f$ is piecewise continuous, then $f(X) = f_1(X) \mathbf{1}_{I_1} + f_2(X) \mathbf{1}_{I_2} + \ldots + f_n(X) \mathbf{1}_{I_n}$, where $f_j(X)$ are random variables as $f_j$ is continuous, and $I_j$ are disjoint intervals. From claim (3.6), $f(X)$ is a linear sum of random variables, and hence is also a random variable.
4. References
- Rosenthal, Jeffrey S. A First Look at Rigorous Probability Theory. World Scientific, 2006. Open WorldCat, http://public.ebookcentral.proquest.com/choice/publicfullrecord.aspx?p=5227675
- Lebanon, Guy, editor. Probability: The Analysis of Data ; Vol. 1. 2012. Available online at http://theanalysisofdata.com/probability/0_2.html
- Math StackExchange, Wikipedia, etc etc :)