Today's agenda:
# Review of primitive functions.
from collections import defaultdict
def tabulate(f, n):
return [f(i) for i in range(n)]
def my_map(f, a):
return [f(x) for x in a]
def my_filter(f, a):
return [x for x in a if f(x)]
def iterate(f, x, a):
if len(a) == 0:
return x
else:
return iterate(f, f(x, a[0]), a[1:])
def flatten(sequences):
return iterate(plus, [], sequences)
def collect(pairs):
result = defaultdict(list)
for pair in sorted(pairs):
result[pair[0]].append(pair[1])
return list(result.items())
def plus(x, y):
return x + y
# cont...
def reduce(f, id_, a):
if len(a) == 0:
return id_
elif len(a) == 1:
return a[0]
else:
return f(reduce(f, id_, a[:len(a)//2]),
reduce(f, id_, a[len(a)//2:]))
def scan(f, id_, a):
if len(a) == 0:
return [], id_
elif len(a) == 1:
return [id_], a[0]
else:
subproblem = [f(a[i], a[i+1]) for i in range(len(a))[::2]]
partial_output, total = scan(f, id_, subproblem)
ret = (
[partial_output[i//2] if i%2==0 else
f(partial_output[i//2], a[i-1])
for i in range(len(a))],
total
)
return ret
The cost of these functions depends on the concrete data structure used to represent the sequence. E.g., for an array:
$ \begin{array}{lcc} \text{Operation} & \text{Work} & \text{Span} \\ \mathit{length}~a & 1 & 1 \\ \mathit{nth}~a~i & 1 & 1 \\ \mathit{singleton}~x & 1 & 1 \\ \mathit{empty} & 1 & 1 \\ \mathit{isSingleton}~x & 1 & 1 \\ \mathit{isEmpty}~x & 1 & 1 \\ \mathit{subseq}~a~(i,j) & 1 & 1 \\ \mathit{tabulate}~f~n & 1 + \displaystyle\sum_{i=0}^n W\left({f(i)}\right) & 1 + \displaystyle\max_{i=0}^n S\left({f(i)}\right) \\[2mm] \mathit{map}~f~a & 1 + \displaystyle\sum_{x \in a} W\left({f(x)}\right) & 1 + \displaystyle\max_{x \in a} S\left({f(x)}\right) \\[2mm] \mathit{filter}~f~a & 1 + \displaystyle\sum_{x \in a} W\left({f(x)}\right) & \lg \lvert a \rvert + \displaystyle\max_{x \in a} S\left({f(x)}\right) \\[2mm] \end{array} $
$ \begin{array}{lcc} \text{Operation} & \text{Work} & \text{Span} \\ \mathit{append}~a~b & 1 + \lvert a \rvert+\lvert b \rvert & 1 \\[2mm] \mathit{flatten}~a & 1 + \lvert a \rvert + \sum_{x \in a} |x| & 1 + \lg \lvert a \rvert \\[2mm] \mathit{update}~a~(i,x) & 1 + \lvert a \rvert & 1 \\[2mm] \mathit{inject}~a~b & 1 + \lvert a \rvert + \lvert b \rvert & \lg(\mathsf{degree}(b)) \\[2mm] \mathit{ninject}~a~b & 1 + \lvert a \rvert + \lvert b \rvert & 1 \\[2mm] \mathit{collect}~f~a & 1 + W\left({f}\right) \cdot \lvert a \rvert \lg \lvert a \rvert & 1 + S\left({f}\right) \cdot \lg^2 \lvert a \rvert \\[2mm] \mathit{iterate}~f~x~a & 1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} W\left({f(y,z)}\right) & 1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} S\left({f(y,z)}\right) \\[2mm] \mathit{reduce}~f~x~a & 1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} W\left({f(y,z)}\right) & \lg \lvert a \rvert \cdot \displaystyle\max_{f(y,z) \in \mathcal{T}(-)} S\left({f(y,z)}\right) \\[2mm] \mathit{scan}~f~x~a & \lvert a \rvert & \lg \lvert a \rvert \end{array} $
Why does filter require logarithmic span?
def my_filter(f, a):
return [x for x in a if f(x)]
$S(filter \: f \: a) = \lg \lvert a \rvert + \displaystyle\max_{x \in a} S\left({f(x)}\right)$
We need to account for the work to create the return array.
We can't do this in constant span, because the location of one value depends on the location of other values.
$filter \:\: positive \:\: [-1,3,-2,4,-5,6] \rightarrow [3,4,6]$
idea: Make a first, parallel pass to map all satisfying values to singletons and everything else to empty lists:
$[[],[3],[],[4],[],[6]]$
Then flatten everything down:
$[3,4,6]$
def deflate(f, x):
return [x] if f(x) else []
def my_filter(f,a):
return flatten(my_map(lambda element: deflate(f,element), a))
def positive(x):
if x > 0:
return True
return False
deflate(positive,-2)
my_map(lambda element: deflate(positive,element), [-1,3,-2,4,-5,6])
my_filter(positive, [-1,3,-2,4,-5,6])
[3, 4, 6]
Given a sequence $a$, generate all contiguous subsequences.
E.g,
$a = [1, 3, 5, 7]$
$\texttt{all_contiguous_subsequences}(a) = $
$[1], [1,3], [1,3,5], [1,3,5,7], [3], [3,5], [3,5,7], [5], [5,7], [7]$
# sequential solution
def all_contiguous_subseq(a):
for i in range(len(a)):
for j in range(i+1, len(a)+1):
yield a[i:j]
list(all_contiguous_subseq([1,2,3,4,5]))
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5], [2], [2, 3], [2, 3, 4], [2, 3, 4, 5], [3], [3, 4], [3, 4, 5], [4], [4, 5], [5]]
Can we solve this using our primitives?
$\texttt{tabulate}$ is our equivalent of a basic for loop.
tabulate(lambda i: i, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
We can use $\texttt{tabulate}$ to slice into a list:
a = [1,2,3,4,5]
i=0
tabulate(lambda j: a[i:i+j+1],
len(a)-i)
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]]
We can nest our $\texttt{tabulate}$s to vary $i$ as well.
tabulate(lambda i:
tabulate(lambda j: a[i:i+j+1],
len(a)-i),
len(a))
[[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]], [[2], [2, 3], [2, 3, 4], [2, 3, 4, 5]], [[3], [3, 4], [3, 4, 5]], [[4], [4, 5]], [[5]]]
Last step, flatten everything together.
# all contiguous subsequences
a = [1,2,3,4,5]
flatten(
tabulate(lambda i:
tabulate(lambda j: a[i:i+j+1],
len(a)-i),
len(a))
)
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5], [2], [2, 3], [2, 3, 4], [2, 3, 4, 5], [3], [3, 4], [3, 4, 5], [4], [4, 5], [5]]
How many calls to a[i:i+j+1] (i.e., subseq)?
If $|a|=n$,
$$ \sum_{i=1}^n = \frac{n(n-1)}{2} \in O(n^2) $$Work and span of subseq is O(1) (why?)
Therefore, total work is $O(n^2)$.
flatten(
tabulate(lambda i:
tabulate(lambda j: a[i:i+j+1],
len(a)-i),
len(a))
)
Span of inner tabulate is $O(1)$, and outer tabulate is also $O(1)$.
flatten at the end requires $O(\lg n)$ span.
Therefore, total span is $O(\lg n)$