title: Probability author: Keith A. Lewis institution: KALX, LLC email: kal@kalx.net classoption: fleqn fleqn: true abstract: Probability – the foundation of statistics. ...

\newcommand{\one}{⚀} \newcommand{\two}{⚁} \newcommand{\three}{⚂} \newcommand{\four}{⚃} \newcommand{\five}{⚄} \newcommand{\six}{⚅}

Probability Theory

In order to understand statistics one must first understand probability theory.

Chevalier de Méré

A cubical die has six faces: $\one$, $\two$, $\three$, $\four$, $\five$, and $\six$. Each time a die is rolled the top face is the result of the roll. If the die is fair each face has an equal probability of occuring. The probability of rolling a $\six$ in one roll is $1/6$.

If a die is rolled twice then a $\six$ occurs in exacty $11 = 6 + 6 - 1$ of the $36 = 6^2$ possible outcomes.¹ In $6$ cases it occurs on the first roll and in $6$ cases it occurs on the second roll but rolling $\six$ twice is included in both the first and second cases and should only be counted once. The probability of rolling a $\six$ in two rolls is $11/36$.

If a die is rolled three times then the number of cases involving a $\six$ in the $216 = 6^3$ possible outcomes is a more difficult counting problem. It is easier to count the number of cases where a $\six$ does not occur: $125 = 5^3$. The number of times $\six$ does show up is therefor $91 = 216 - 125. The probability of rolling a $\six$ in three rolls is $91/216$. Note this can be used for the solution of the two roll case: $11 = 36 - 25$. It is not a coincidence that $91 = 3\times 6^2 - 3\times 6 + 1\times 1$. This is closely related to the formula $(6 - 1)^3 = 6^3 - 3\times 6^2 + 3\times 6 - 1$.

If a die is rolled $n$ times then the number cases involving a $\six$ is $6^n - 5^n$. The probability of this happening is $(6^n - 5^n)/6^n = 1 - (5/6)^n$. Note that this tends to $1$ as $n$ tends to infinity; you will eventually roll a $\six$ if you roll long enough.

Chevalier de Méré was concerned with the problem of how to divide the wagers if the game was interupted part way thorough. (Vingt-deux, voilà les flics!) The initial odds are $91$ will get you $125$ in the three roll game. If the first roll is not a $\six$ the odds of winning went down since there are only two rolls remaining to get a $\six$. If the game stops after the first roll how should the bet be fairly divided?

Antoine Gombaud (his real name) asked his salon friends Blaise Pascal and Pierre de Fermat about this puzzle. They came up with a complete solution of how to count with partial information.

Read on.

Probabilty Space

A sample space is a set of outcomes. Subsets of a sample space are events. A probability measure assigns a number between 0 and 1 to events that represents a degree of belief an outcome will belong to the event. Partial information is modeled by a partition of the sample space.

Sample Space

A sample space is a set of what can happen in a probability model. An outcome is an element of a sample space. An event is a subset of a sample space.

A sample space for flipping a coin can be modeled by the set ${H,T}$ where the outcome $H$ indicates heads and $T$ indicates tails. Of course any two element set could be used for this.

A sample space for flipping a coin twice can be modeled by the set ${HH, HT, TH, TT}$ where each outcome specifies the individual outcomes of the first and second flip. The event 'the first flip was heads' is the subset ${HH, HT}$. The partition ${{HH, HT},{TH, TT}}$ represents the partial information of knowing the outcome of the first coin flip. The first event in the partition indicates the first flip was heads. The second event in the partition indicates the first flip was tails.

The first step in any probability model is to specify the possible outcomes. The second step is to assign probabilities to the outcomes.

Measure

A measure $\mu$ on a set $S$ assigns numbers to subsets of $S$ and satisfies $$ \mu(E\cup F) = \mu(E) + \mu(F) - \mu(E\cap F) $$ for any subsets $E,F\subseteq S$ and $\mu(\emptyset) = 0$. Measures do not count twice.

Exercise. Show if $\nu(E\cup F) = \nu(E) + \nu(F) - \nu(E\cap F)$ for $E,F\subseteq S$ then $\mu = \nu - \nu(\emptyset)$ is measure.

Solution

By $\mu = \nu - \nu(\emptyset)$ we mean $\mu(E) = \nu(E) - \nu(\emptyset)$ for any subset $E\subseteq S$. Clearly $\mu(E\cup F) = \mu(E) + \mu(F) - \mu(E\cap F)$ for any $E,F\subseteq S$. Since $\mu(\emptyset) = \nu(\emptyset) - \nu(\emptyset) = 0$, $\mu$ is a measure.

Exercise. Show if $\mu$ is a measure then $\mu(E\cup F) = \mu(E) + \mu(F)$ for any subsets $E$ and $F$ with empty intersection $E\cap F = \emptyset$.

Solution

Since $\mu(\emptyset) = 0$, $\mu(E\cup F) = \mu(E) + \mu(F) - \mu(E\cap F) = \mu(E) + \mu(F) - \mu(\emptyset) = \mu(E) + \mu(F)$.

Exercise. Show if $\mu$ is a measure then $\mu(E) = \mu(E\cap F) + \mu(E\cap F')$ for any subsets $E$ and $F$ where $F' = S\setminus F = {x\in S:x\not\in F}$ is the complement of $F$ in $S$.

Solution

Note $(E\cap F)\cup(E\cap F') = E\cap(F\cup F') = E\cap S = E$ and $(E\cap F)\cap(E\cap F') = E\cap(F\cap F') = E\cap\emptyset = \emptyset$ so $\mu(E\cap F) + \mu(E\cap F') = \mu((E\cap F)\cup(E\cap F') = \mu(E)$.

Partition

A partition splits a sample space into disjoint subsets with union equal to the sample space. Partitions are how partial information is represented. The events in the partition are called atoms. The way they represent partial information is you only know what atom an outcome belongs to, not the actual outcome.

Partitions define an equivalence relation on outcomes. We say $\omega\sim\omega'$ if and only if they belong to the same atom.

Exercise. Show $\omega\sim\omega$, $\omega\sim\omega'$ implies $\omega'\sim\omega$, and $\omega\sim\omega'$, $\omega'\sim\omega''$ implies $\omega\sim\omega''$.

This is the definition of an equivalence relation. It is the mathematical way of saying two things are the "same" even if they are not equal.

Probability Measure

A probability measure $P$ on the sample space $\Omega$ is a measure taking values in the interval $[0,1]$ with $P(\Omega) = 1$. The probability $P(E)$ for $E\subseteq\Omega$ represents a degree of belief that a random outcome will belong to the event $E$. This is a somewhat nebulous and controversial notion. How do "random outcomes" occur?

Probability theory originated with games of chance. One way to interpret this is "How much money would you wager on an outcome involving rolling dice or selecting cards from a deck?" Putting your money where your mouth is is a way to clarify thinking.

Exercise. Show $P(E\cup F) \le P(E) + P(F)$ for any events $E$ and $F$ when $P$ is a probability measure.

Exercise. Show $P(\cup_i E_i) \le \sum_i P(E_i)$ for any events $(E_i)$ when $P$ is a probability measure.

If $\Omega$ has a finite number of outcomes, we can define a probability measure by specifying $p_\omega = P({\omega})$ for $\omega\in\Omega$. Note $p_\omega\ge 0$ and $\sum_{\omega\in\Omega} = 1$. The probability of the event $E\subseteq\Omega$ is $P(E) = \sum_{\omega\in E} p_\omega$.

For the two coin flip model (assuming the coin is fair) we assign probability of $1/4$ to each outcome. The probability of the first flip being heads is $P({HH,HT}) = P({HH} \cup {HT}) = P({HH} + P({HT}) = 1/4 + 1/4 = 1/2$.

Random Variable

A random variable is a symbol that can be used in place of a number when manipulating equations and inequalities with with additional information about the probability of the values it can take on.

The mathematical definition of a random variable is a function $X\colon\Omega\to\mathbf{R}$. Its cumulative distribution function is $F(x) = P(X\le x) = P({\omega\in\Omega\mid X(\omega) \le x})$. More generally, given a subset $A\subseteq\mathbf{R}$ the probability that $X$ takes a value in $X$ is $P(X\in A) = P({\omega\in\Omega}: X(\omega\in A))}$.

Two random variables have the same law if they have the same cdf.

The cdf tells you everything there is to know about the probability of the values the random variable can take on. For example, $P(a < X \le b) = F(b) - F(a)$.

Exercise. Show $P(a\le X\le b) = \lim{x\uparrow a} F(b) - F(x)$_.

Hint: $[a,b] = \cap_n (a - 1/n, b]$.

In general $P(X\in A) = \int_A dF(x)$ for sufficiently nice subsets $A\subset\mathbf{R}$ where we are using Riemann–Stieltjes integration.

Exercise: Show for any cumulative distribution function $F$ that $F(x) \le F(y)$ if $x < y$, $\lim{x\to -\infty} F(x) = 0$, $\lim_{x\to\infty} F(x) = 1$, and $F$ is right continuous with left limits_.

Hint: For right continuity use $(-\infty, x] = \cap_n (-\infty, x + 1/n]$.

The cdf $F(x) = \max{0,\min{1,x}}$ defines the uniformly distributed random variable, $U$, on the interval $[0,1]$. For $0\le a < b\le 1$, $P(a < U \le b) = P(U\in (a,b]) = b - a$ and $P(U < 0) = 0 = P(U > 1)$.

Exercise. If $X$ has cdf $F$, then $X$ and $F^{-1}(U)$ have the same law.

Exercise. If $X$ has cdf $F$, then $F(X)$ and $U$ have the same law.

This shows a uniformly distributed random variable has sufficient randomness to generate any random variable. There are no random, random variables.

Given a cdf $F$ we can define a random variable having that distribution using the identity function $X\colon\mathbf{R}\to\mathbf{R}$, where $X(x) = x$. Let $P$ be the probability measure on $\mathbf{R}$ defined by $P(A) = \int_A dF(x)$.

The mathematical definition is more flexible than defining a random variable by its cumulative distribution function.

Continuous Random Variable

If $F(x) = \int_{-\infty}^x F'(u),du$ we say the random variable is continuously distributed. The density function is $f = F'$. Any function satisfying $f\ge 0$ and $\int_\mathbf{R} f(x),dx = 1$ is a density function for a random variable.

Discrete Random Variable

If $dF = \sum_{\omega\in\Omega} p_\omega \delta_\omega$ where $\Omega\subseteq\mathbf{R}$ is countable we say the random variable is discretely distributed. Here $\delta_\omega$ is the delta function with unit mass at $\omega$ defined by $\int_{\mathbf{R}} f(x) \delta_\omega,dx = f(\omega)$ when $f$ is continuous at $\omega$.

Exercise. Show if $H\omega(x) = 0$ for $\omega < x$ and $H_\omega(x) = 1$ for $\omega\ge x$ then $f(\omega) = \int_{\mathbf{R}} f(x),dH_\omega(x)$ when $f$ is continuous_.

Using this more precise notation, $F = \sum_{\omega\in\Omega} p_\omega H_\omega$.

Footnotes

The _cartesian product of sets $A$ and $B$ is the set of pairs $A\times B = {(a,b):a\in A, b\in B}$. The number of elements in $A\times B$ is the number of elements in $A$ times the number of elements in $B$. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_prob.md

_prob.md

Probability Theory

Chevalier de Méré

Probabilty Space

Sample Space

Measure

Partition

Probability Measure

Random Variable

Continuous Random Variable

Discrete Random Variable

Files

_prob.md

Latest commit

History

_prob.md

File metadata and controls

Probability Theory

Chevalier de Méré

Probabilty Space

Sample Space

Measure

Partition

Probability Measure

Random Variable

Continuous Random Variable

Discrete Random Variable

Footnotes