Agenda:
Suppose you could flip a coin to make a decision in an algorithm, rather than relying on inputs. So:
if (random() > 0.5):
print('heads')
else:
print('tails')
Why would we ever do this?
What does "hope" mean, computationally?
A proof that our algorithm is, on average, correct and efficient.
Randomized algorithms can often be:
A probability space consists of a sample space $\Omega$ representing the set of possible outcomes, and a probability measure $\mathbf{P}: \Omega \rightarrow \mathbb{R}$. The probability measure $\mathbf{P}$ must satisfy three conditions:
Let $\Omega$ be the set of all outcomes of a pair of 6-sided dice.
$\Omega = \{(1,1), (1,2),...,(2,1),...,(6,6)\}$
How many outcomes are there if the dice sum to 3? To 4?
The dice sum to 3 if we roll any of $\{(1, 2), (2, 1)\}$
They sum to 4 if we roll any of $\{(1, 3), (2, 2), (3, 1)\}$.
What is the probability of rolling a 3? ... a 4?
Every outcome is disjoint and has probability 1/36. So the probability of rolling a 3 is 2/36 = 1/18 and the probability of rolling a 4 is 3/36 = 1/12.
If we want to consider multiple, possibly overlapping events. Then, the union bound is helpful:
$$\mathbf{P}\left[{\bigcup_{0 \leq i < n} A_i}\right] \leq \sum_{i=0}^{n-1} \mathbf{P}\left[{A_i}\right]$$
We may want to know the probability of an event $A$ with "partial knowledge." It is very common to want to know the conditional probability of an event $A$ given that $B$ occurs ($\mathbf{P}\left[{B}\right] > 0$)?
The conditional probability of an event $A$ given that $B$ occurs ($\mathbf{P}\left[{B}\right] > 0$) is:
$$\mathbf{P}\left[{A \mid B}\right] = \frac{\mathbf{P}\left[{A \cap B}\right]}{\mathbf{P}\left[{B}\right]}$$What is the probability that the first die comes up 1 given that the sum of a pair of dice is 4?
$$\mathbf{P}\left[{A \mid B}\right] = \frac{\mathbf{P}\left[{A \cap B}\right]}{\mathbf{P}\left[{B}\right]}$$$A = \{(1,1),(1,2),(1,3),(1,4),(1,5),(1,6)\}$
$B = \{(1,3),(2,2),(3,1)\}$
${A \cap B} = \{(1,3)\}$
A useful fact that relates conditional probabilities to non-conditional probabilities is the Law of Total Probability:
Let $\Omega$ be a sample space and let $A_0, \ldots, A_{n-1}$ be a set of events that partition $\Omega$ such that $\mathbf{P}\left[{A_i}\right] > 0$ for all $0 \le i < n.$ Then, for any event $B$ we have that:
$\begin{array}{lll} \mathbf{P}\left[{B}\right] & = & \displaystyle\sum_{i=0}^{n-1} \mathbf{P}\left[{B \cap A_i}\right] \\ & = & \sum_{i=0}^{n-1} \mathbf{P}\left[{A_i}\right]\mathbf{P}\left[{B \mid A_i}\right] \end{array}$
Intuition: the probability of an event $B$ is just the sum of its probability over all potential disjoint situations.
Suppose (this example is fictional) there are two variations of the flu vaccine: $A_1$ and $A_2$.
The probability that a person will receive $A_1$ is $0.4$ and the probability that a person will receive $A_2$ is $0.6$.
A person who receives vaccine $A_1$ has a $12\%$ chance of catching the flu this season
A person who received vaccine $A_2$ has a $5\%$ chance of catching the flu.
What is the probability $\mathbf{P}\left[B\right]$ that a vaccinated person will catch the flu?
What is the probability $\mathbf{P}\left[B\right]$ that a vaccinated person will catch the flu?
For a second example of this, see Vol 1, Ch 18 of the book.
Another useful concept is independence. We say that two events $A$ and $B$ are independent when they are disjoint.
If $A$ and $B$ are independent, $\mathbf{P}\left[{A \cap B}\right] = \mathbf{P}\left[{A}\right] \cdot \mathbf{P}\left[{B}\right].$
A random variable $X$ is a real-valued function on the outcomes of an experiment, i.e. $X : \Omega\to \mathbb{R}$. We will consider only discrete random variables.
Even more simply, we will most often use indicator random variables, which map outcomes to 0 or 1.
$$X(d_1,d_2) = \left\{ \begin{array}{ll} 1 \text{ if}~d_1 = d_2 \\ 0 \text{ if}~d_1 \not= d_2 ~. \end{array} \right.$$Note: The term "random variable" is somewhat odd. Really it's just a function, but we use this terminology since it's output it dependent on a random process and it is useful to look at functions of outcomes (which have probabilities under a given probability measure).
The expectation of a random variable $X$ defined for a probability space $(\Omega, \mathbf{P})$ is denoted:
$$\mathbf{E}_{\Omega,\mathbf{P}}[X] = \sum_{y \in \Omega} X(y) \cdot \mathbf{P}\left[\{y\}\right].$$We usually write this more conveniently as:
$$ \mathbf{E}\left[{X}\right] = \sum_{x} x \cdot \mathbf{P}\left[X = x\right]. $$Let $X$ be the value of a pair of dice. $X$ can take on values between 2 and 12, where each has a certain probability based on the possible ways the dice can sum to that value. What is the expected value of the value of a pair of unbiased dice? It is:
$$ 2\cdot\frac{1}{36} + 3\cdot\frac{2}{36} + 4\cdot\frac{3}{36} + \cdots + 11\cdot\frac{2}{36} + 12\cdot\frac{1}{36}$$Intuitively, the expectation is really just a weighted average of the outcomes of the random variable.
expectation = 0
for d1 in range(1,7):
for d2 in range(1,7):
expectation += (d1+d2)/36
print(expectation)
7.0
Let's look at coin flips where we have equal probability of heads/tails. Let $X$ be the number of flips until we get heads. What is $E[X]$?
Observe that we have two cases:
Solving for $E[X]$, what do we get?
$E[X] = 2$
More generally, the expected number of trials to get an outcome of probability $p$ is $1/p$. (We'll see this later.)
We will make repeated use of some basic facts about the expectations of random variables.
First, we can easily treat the expected value of functions of random variables:
$$ \mathbf{E}\left[f(X)\right] = \sum_{x} f(x) \cdot \mathbf{P}\left[X = x\right]. $$We also have linearity of expectations for random variables:
$$ \mathbf{E}\left[X + Y\right] = \mathbf{E}\left[X\right] + E\left[ Y\right]. $$This identity will help us simplify the calcuation of the expected work and span of an algorithm.
Ultimately, if we choose to make random choices in our algorithm, we must relax our notion of correctness and our definitions of work/span.
When using randomization we usually have one of two types of algorithms:
We will focus on Las Vegas algorithms.
Suppose we are given a list $L$ of length $n$. We choose a random index $i$ of the list and return $L[i:]$ as output.
What is the expected size of $L[i:]$?
Let $X$ be a random variable that is the size of the output list. By the definition of expectation we have:
$$\mathbf{E}[X] = \sum_{x=1}^{n} x \cdot \mathbf{P}[X = x]$$What is the probability that we choose any individual index?
This might follow your intuition that, since all indices are equally likely, the chosen index averages out to roughly half of the list size.
A related question is, what is the probability of getting a list with more than half of the elements?
That is, what is $\mathbf{P}\left[X \geq n/2\right]$?
Intuitively, we can say it is the probability of choosing any index that gives us a size $\geq n/2$.
Suppose we are given a list $L$ of length $n$.
For each element we flip an unbiased coin $x_i$. Return a list $R$ with $L[i]$ such that $x_i$ is heads.
What is the expected size of $R$?
Let $X_i$ be an indicator random variable that is 1 if element $i$ is chosen, and 0 otherwise, and let $X$ be the size of $R$.
We can see that $X = \sum_{i=0}^{n-1} X_i$ by definition.
By linearity of expectation we can compute:
$$\mathbf{E}[X] = \sum_{i=0}^{n} \mathbf{E}[X_i].$$We can note that by the definition of expectation
$\begin{align} E[X_i] &= 0\cdot \mathbf{P}[X_i = 0] + 1\cdot\mathbf{P}[X_i = 1] \\ &= \mathbf{P}[X_i = 1]\\ &= 1/2 \end{align}$
Thus, $\mathbf{E}[X_i] = 1/2$ for all $i$.
So we have that $\begin{align} \mathbf{E}[X] &= \sum_{i=0}^{n} \mathbf{E}[X_i] \\ &= \sum_{i=0}^{n} \frac{1}{2} = \frac{n}{2} \end{align}$
So the expected size of $R$ is $n/2$.
These simplistic algorithms illustrate the kind of analysis we will be doing to analyze the expected work and span of algorithms for manipulating lists.
We will look at algorithms for selection and sorting. For both problems we will analyze recursive algorithms that use randomization to select the input for the recursive calls.
This creates a complicated situation in which each successive call depends on the previous one, and thereby requires some clever accounting to derive the expected work and span.
We will start by examining the problem of selection.