CMPS 2200¶

Introduction to Algorithms¶

Cost of sequence functions¶

Today's agenda:

  • Review of subsequence functions
  • Work/Span of functions
In [1]:
# Review of primitive functions.
from collections import defaultdict

def tabulate(f, n):
    return [f(i) for i in range(n)]

def my_map(f, a):
    return [f(x) for x in a]

def my_filter(f, a):
    return [x for x in a if f(x)]

def iterate(f, x, a):
    if len(a) == 0:
        return x
    else:
        return iterate(f, f(x, a[0]), a[1:])

def flatten(sequences):
    return iterate(plus, [], sequences)

def collect(pairs):
    result = defaultdict(list)
    for pair in sorted(pairs):
        result[pair[0]].append(pair[1])
    return list(result.items())

def plus(x, y):
    return x + y

# cont...
In [2]:
def reduce(f, id_, a):
    if len(a) == 0:
        return id_
    elif len(a) == 1:
        return a[0]
    else:
        return f(reduce(f, id_, a[:len(a)//2]),
                 reduce(f, id_, a[len(a)//2:]))

def scan(f, id_, a):
    if len(a) == 0:
        return [], id_
    elif len(a) == 1:
        return [id_], a[0]
    else:
        subproblem = [f(a[i], a[i+1]) for i in range(len(a))[::2]]

        partial_output, total = scan(f, id_, subproblem)

        ret = (
            [partial_output[i//2] if i%2==0 else
             f(partial_output[i//2], a[i-1])
             for i in range(len(a))],
            total
        )
        return ret

Costs¶

The cost of these functions depends on the concrete data structure used to represent the sequence. E.g., for an array:

$ \begin{array}{lcc} \text{Operation} & \text{Work} & \text{Span} \\ \mathit{length}~a & 1 & 1 \\ \mathit{nth}~a~i & 1 & 1 \\ \mathit{singleton}~x & 1 & 1 \\ \mathit{empty} & 1 & 1 \\ \mathit{isSingleton}~x & 1 & 1 \\ \mathit{isEmpty}~x & 1 & 1 \\ \mathit{subseq}~a~(i,j) & 1 & 1 \\ \mathit{tabulate}~f~n & 1 + \displaystyle\sum_{i=0}^n W\left({f(i)}\right) & 1 + \displaystyle\max_{i=0}^n S\left({f(i)}\right) \\[2mm] \mathit{map}~f~a & 1 + \displaystyle\sum_{x \in a} W\left({f(x)}\right) & 1 + \displaystyle\max_{x \in a} S\left({f(x)}\right) \\[2mm] \mathit{filter}~f~a & 1 + \displaystyle\sum_{x \in a} W\left({f(x)}\right) & \lg \lvert a \rvert + \displaystyle\max_{x \in a} S\left({f(x)}\right) \\[2mm] \end{array} $

$ \begin{array}{lcc} \text{Operation} & \text{Work} & \text{Span} \\ \mathit{append}~a~b & 1 + \lvert a \rvert+\lvert b \rvert & 1 \\[2mm] \mathit{flatten}~a & 1 + \lvert a \rvert + \sum_{x \in a} |x| & 1 + \lg \lvert a \rvert \\[2mm] \mathit{update}~a~(i,x) & 1 + \lvert a \rvert & 1 \\[2mm] \mathit{inject}~a~b & 1 + \lvert a \rvert + \lvert b \rvert & \lg(\mathsf{degree}(b)) \\[2mm] \mathit{ninject}~a~b & 1 + \lvert a \rvert + \lvert b \rvert & 1 \\[2mm] \mathit{collect}~f~a & 1 + W\left({f}\right) \cdot \lvert a \rvert \lg \lvert a \rvert & 1 + S\left({f}\right) \cdot \lg^2 \lvert a \rvert \\[2mm] \mathit{iterate}~f~x~a & 1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} W\left({f(y,z)}\right) & 1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} S\left({f(y,z)}\right) \\[2mm] \mathit{reduce}~f~x~a & 1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} W\left({f(y,z)}\right) & \lg \lvert a \rvert \cdot \displaystyle\max_{f(y,z) \in \mathcal{T}(-)} S\left({f(y,z)}\right) \\[2mm] \mathit{scan}~f~x~a & \lvert a \rvert & \lg \lvert a \rvert \end{array} $

Why does filter require logarithmic span?

def my_filter(f, a):
    return [x for x in a if f(x)]


$S(filter \: f \: a) = \lg \lvert a \rvert + \displaystyle\max_{x \in a} S\left({f(x)}\right)$

We need to account for the work to create the return array.


We can't do this in constant span, because the location of one value depends on the location of other values.

$filter \:\: positive \:\: [-1,3,-2,4,-5,6] \rightarrow [3,4,6]$



idea: Make a first, parallel pass to map all satisfying values to singletons and everything else to empty lists:

$[[],[3],[],[4],[],[6]]$

Then flatten everything down:

$[3,4,6]$

In [3]:
def deflate(f, x):
    return [x] if f(x) else []

def my_filter(f,a):
    return flatten(my_map(lambda element: deflate(f,element), a))

def positive(x):
    if x > 0:
        return True
    return False

deflate(positive,-2)
my_map(lambda element: deflate(positive,element), [-1,3,-2,4,-5,6])
my_filter(positive, [-1,3,-2,4,-5,6])
Out[3]:
[3, 4, 6]

Example Problem: All Contiguous Subsequences¶

Given a sequence $a$, generate all contiguous subsequences.


E.g,

$a = [1, 3, 5, 7]$

$\texttt{all_contiguous_subsequences}(a) = $

$[1], [1,3], [1,3,5], [1,3,5,7], [3], [3,5], [3,5,7], [5], [5,7], [7]$

In [4]:
# sequential solution

def all_contiguous_subseq(a):
    for i in range(len(a)):
        for j in range(i+1, len(a)+1):
            yield a[i:j]
            
list(all_contiguous_subseq([1,2,3,4,5]))
Out[4]:
[[1],
 [1, 2],
 [1, 2, 3],
 [1, 2, 3, 4],
 [1, 2, 3, 4, 5],
 [2],
 [2, 3],
 [2, 3, 4],
 [2, 3, 4, 5],
 [3],
 [3, 4],
 [3, 4, 5],
 [4],
 [4, 5],
 [5]]

Can we solve this using our primitives?


$\texttt{tabulate}$ is our equivalent of a basic for loop.

In [9]:
tabulate(lambda i: i, 10)
Out[9]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


We can use $\texttt{tabulate}$ to slice into a list:

In [10]:
a = [1,2,3,4,5]
i=0
tabulate(lambda j: a[i:i+j+1],
    len(a)-i)
Out[10]:
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]]

We can nest our $\texttt{tabulate}$s to vary $i$ as well.

In [11]:
tabulate(lambda i: 
            tabulate(lambda j: a[i:i+j+1],
                    len(a)-i),
        len(a))
Out[11]:
[[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]],
 [[2], [2, 3], [2, 3, 4], [2, 3, 4, 5]],
 [[3], [3, 4], [3, 4, 5]],
 [[4], [4, 5]],
 [[5]]]


Last step, flatten everything together.

In [12]:
# all contiguous subsequences
a = [1,2,3,4,5]
flatten(
    tabulate(lambda i: 
             tabulate(lambda j: a[i:i+j+1],
                      len(a)-i),
         len(a))
)
Out[12]:
[[1],
 [1, 2],
 [1, 2, 3],
 [1, 2, 3, 4],
 [1, 2, 3, 4, 5],
 [2],
 [2, 3],
 [2, 3, 4],
 [2, 3, 4, 5],
 [3],
 [3, 4],
 [3, 4, 5],
 [4],
 [4, 5],
 [5]]

Analysis of All Contiguous Subsequences¶

Work¶

flatten(
    tabulate(lambda i: 
             tabulate(lambda j: a[i:i+j+1],
                      len(a)-i),
         len(a))
)

How many calls to a[i:i+j+1] (i.e., subseq)?

If $|a|=n$,

$$ \sum_{i=1}^n = \frac{n(n-1)}{2} \in O(n^2) $$

Work and span of subseq is O(1) (why?)

Therefore, total work is $O(n^2)$.

Span¶

flatten(
    tabulate(lambda i: 
             tabulate(lambda j: a[i:i+j+1],
                      len(a)-i),
         len(a))
)

Span of inner tabulate is $O(1)$, and outer tabulate is also $O(1)$.


flatten at the end requires $O(\lg n)$ span.

Therefore, total span is $O(\lg n)$