We've been discussing Divide and Conquer algorithms.
A divide-and-conquer algorithm, at each step, divides the problem into subproblems, solves each, then combines the results to arrive at the final solution.
Recurrences can easily be written and interpretted from the perspective of divide and conquer algorithms.
We can solve recurrences by drawing and analyzing the Recursion Trees that result from them.
The cost of the entire tree is then the cost of all of the levels of the tree.
In the tree method, we generalize the cost of every level as a function of that level, $C(i)$. E.g,
$$C(i) = \frac{n^2}{2^i},\quad 3^in,\quad n,\quad \lg{\frac{3n}{2^i}},\quad 1, etc...$$And sum up over all levels:
$$\sum_{i=0}^{\log n} C(i)$$We solve the summation through simplification and by using properities of series to arrive as a final runtime.
The tree method is tedious, but it works. Often, we can exploit patterns to perform our analysis more quickly.
Brick method:
With our powerful ideas of divide & conquer, recurrences, and our techniques for solving recurrences, let's see how we can put them to use to make algorithmic headway on a problem where it seems there is no headway to be made!
Multiplication!
Here is an elementary school algorithm for integer multiplication:
Input: $n$ bit integers $x= \langle x_{n-1}, \ldots, x_0\rangle$ and $y = \langle y_{n-1}, \ldots, y_0\rangle$
Output: $x \cdot y$
Example: '1001'$\times$'1101'
def int2binary(n):
return list('{0:b}'.format(n))
int2binary(13)
['1', '1', '0', '1']
nine = int2binary(9)
print(nine)
thirteen = int2binary(13)
print(thirteen)
9*13
['1', '0', '0', '1'] ['1', '1', '0', '1']
117
1001
x 1101
======
1001
0000
1001
+ 1001
=========
1110101
def binary2int(n):
return int(n, 2)
binary2int('1110101')
117
What is the running time of the "elementary school" algorithm?
For two $n$-digit inputs: $O(n^2)$, since for each digit of $x$ we might add a stack of $n$ bits. The total number of bits in the solution is at most $2n$.
Can we improve on this using Divide and Conquer?
How can we break our problem into smaller parts?
Instead of the elementary school algorithm, consider splitting each $n$-digit input in half. Can we multiply recursively?
Let $x = \langle x_L, x_R\rangle$, $y = \langle y_L, y_R\rangle$. Then,
$\begin{align} x &=& 2^{n/2} x_L + x_R \:\:\:\:\:\: \hbox{e.g.} \: 1001: 2^2 (10) + (01)\\ y &=& 2^{n/2} y_L + y_R \:\:\:\:\:\: \hbox{e.g.} \: 1101: 2^2 (11) + (01)\\ \end{align} $
What is does multiplying by $2^{n/2}$ do, and is it efficient?
$2^2 [11] \rightarrow [1100] \:\:$ (shift two places to the left).
Multiplication the long way
$\begin{align} x &=& 2^{n/2} x_L + x_R \:\:\:\:\:\: \hbox{e.g.} \: 1001: 2^2 (10) + (01)\\ y &=& 2^{n/2} y_L + y_R \:\:\:\:\:\: \hbox{e.g.} \: 1101: 2^2 (11) + (01)\\ \end{align} $
So then,
$\begin{align} x\cdot y &=& (2^{n/2} x_L + x_R)(2^{n/2} y_L + y_R) \\ &=& 2^n (x_L \cdot y_L) + 2^{n/2} (x_L \cdot y_R + x_R \cdot y_L) + (x_R \cdot y_R) \\ \end{align} $
We've converted one multiplication of sizes $(n,n)$ into four multiplications of size $(\frac{n}{2}, \frac{n}{2})$.
What recursive algorithm, and recurrence is suggested by this observation?
$W(n) = 4W(n/2) + n$
What is the solution to this recurrence using the brick method? Is it root-dominated, or leaf-dominated?

$C(\hbox{root}) = n$
$C(\hbox{level}\:1) = 4(\frac{n}{2})= 2 \cdot n$
geometrically increasing as we descend the recurrence tree.
A recurrence is leaf-dominated if for all $v$, there is an $\alpha > 1$ such that:
$$C(v) \leq \frac{1}{\alpha} \sum_{u \in D(v)} C(u)$$let $\alpha \leftarrow 2$
$n \le \frac{1}{2}\cdot 2 \cdot n$
$W(n) = 4W(n/2) + n$
The cost of a leaf dominated recurrence is $O(L)$, where $L$ is the number of leaves.
nodes per level: $1, 4, 64, \ldots 4^i \ldots 4^{\lg n} = n^{\lg 4} = n^2$
This is a leaf-dominated reucrrence that is $O(n^2)$ -- the same as the elementary school algorithm!
Now, what is the span of this algorithm if implemented in parallel?
Assuming each multiplication can be done in parallel, and that addition has span $O(n)$, we get that
$$S(n) = S(n/2) + n$$which yields a span of ...
brick method
$S(\hbox{root})=n$
$S(\hbox{level 1}) = \frac{n}{2}$
$\rightarrow$ root dominated
$S(n) \in O(n)$. This is much better than the span of the grade school algorithm, which is $O(n^2)$!
What parallelism is achieved?
Parallelism $= \frac{W}{S} = \frac{n^2}{n} = n$
Can we do any better?
Recall that
$\begin{align} x\cdot y &=& (2^{n/2} x_L + x_R)(2^{n/2} y_L + y_R) \\ &=& 2^n (x_L \cdot y_L) + 2^{n/2} (x_L \cdot y_R + x_R \cdot y_L) + (x_R \cdot y_R) \\ \end{align} $
Observation:
$\begin{align} (x_L + x_R)\cdot (y_L + y_R) &=& (x_L\cdot y_L) + \pmb{(x_L\cdot y_R) + (x_R\cdot y_L)} + (x_R\cdot y_R)\\ \end{align} $
$\begin{align} \pmb{(x_L\cdot y_R) + (x_R\cdot y_L)} = (x_L + x_R)\cdot (y_L + y_R) - (x_L\cdot y_L) - (x_R\cdot y_R)\\ \end{align}$
How does our observation help us?
If we calculate $(x_L\cdot y_L)$, $(x_R\cdot y_R)$, and $(x_L + x_R)\cdot (y_L + y_R)$, that is three recursive multiplications.
So with 3 recursive multiplications and two more "additions", we then get that $W(n) = 3W(n/2) + n$. What is the running time?
brick method
$C(\hbox{root}) = n$
$C(\hbox{level 1}) = \frac{3n}{2}$
$\frac{3}{2} > 1 \Rightarrow$ leaf dominated.
But, there are fewer leaves this time. Why?
nodes per level: $1, 3, 9, \ldots 3^i \ldots 3^{\lg n}=n^{\lg 3} \:\:\:\: (\hbox{by}\: a^{\log_b c} = c^{\log_b a})$
Using the brick method, this is still a leaf-dominated recurrence, and thus the running time is $O(n^{\log_2 3}) \:\: $ (approximately $O(n^{1.58}$) versus of $O(n^2)$ earlier).
This is known as the Karatsuba-Ofman algorithm (1962), and is the earliest known divide-and-conquer algorithm for integer multiplication. It is actually implemented in Intel hardware!
So, our we've decreased work from $O(n^2)$ to $O(n^{\lg 3})$.
Our span stays the same: $O(n)\:\:$ Why?
Parallelism $= \frac{W}{S} = \frac{n^{\lg 3}}{n} \approx n^{.58} < n$
So, our parallelism went down. Is that good or bad?
Schönhage and Strassen (1971) gave an algorithm that runs in $O(n\log n \log\log n)$ time.
In 2007, Fürer gave an algorithm that runs in $n \log n 2^{O(\log^* n)}$.
What is the fastest possible sequential algorithm for integer multiplication? In parallel?