CMPS 2200¶

Introduction to Algorithms¶

Overview¶

Agenda:

  • Introductions
  • Motivation for course
  • Formalisms used throughout the course
  • Navigating the course

What is an algorithm?¶

an explicit, precise, unambiguous, mechanically-executable sequence of elementary instructions, usually intended to accomplish a specific purpose.

-- Jeff Erickson

Examples?

BOB(n):¶

  • for i n down to 1
    • Sing “i bottles of beer on the wall, $i$ bottles of beer,”
    • Sing “Take one down, pass it around, $i-1$ bottles of beer on the wall.”
  • Sing “No bottles of beer on the wall, no bottles of beer,”
  • Sing “Go to the store, buy some more, n bottles of beer on the wall.”

Lagniappe: Another old algorithmic drinking song is The Barley Mow. I'll leave it to you to come up with the program to produce it.

Examples of algorithm-like things that are not algorithms?

BeAMillionaireAndNeverPayTaxes():¶

  • Get a million dollars.
  • If the tax man comes to your door and says, “You have never paid taxes!”
    • Say “I forgot.”

What makes a good algorithm?¶

  • correct
  • user-friendly
  • many features
  • robust
  • simple
  • secure
  • low programmer cost
  • efficient
    • runs quickly
    • requires little memory

Then, why study efficiency?¶

  • separates feasible from infeasible
  • correlates with user-friendliness

What if it took Google took 2 minutes to return results?¶

Simple warmup: What does this do?

In [26]:
def my_function(a, b):
    for i,v in enumerate(a):
        if v == b:
            return i
    return -1
In [1]:
def linear_search(mylist, key):
    """
    Args:
      mylist...a list
      key......a search key
    Returns:
      index of key in mylist; -1 if not present
    """
    for i,v in enumerate(mylist):
        if v == key:
            return i
    return -1
 
linear_search([5,1,10,7,12,4,2], 12)
Out[1]:
4

What factors affect the running time of this algorithm?

  • Input size
  • Input values: is key at start or end?
  • Hardware!
    • TI-85 vs. Supercomputer

We need a way to compare the efficiency of algorithms that abstracts away details of hardware and input.¶

Analysis of Linear Search, the long way¶

  • Assign a time cost $c_i$ to each line $i$.
  • Figure out how often each line is run $n_i$
  • total cost is the cost of each line multiplied by the number of times it is run


$ \hbox{Cost(linear-search, mylist, key)} = \sum_i c_i * n_i $

In [28]:
def linear_search(mylist, key):        #   cost         number of times run
    for i,v in enumerate(mylist):      #   c1               ?
        if v == key:                   #   c2               ?
            return i                   #   c3               ?
    return -1                          #   c4               ?

Best/Average/Worst case¶

To deal with the effects of the input values on performance, we can consider three types of analysis:

  • Worst-case: maximum time for any input of size $n$
linear_search([5,1,10,7,12,4,2], 9999)
  • Best case: minimum time of any input of size $n$
linear_search([5,1,10,7,12,4,2], 5)
  • Average case: expected time over all inputs of size n
    • Need some probability distribution over inputs
for (mylist, key) in ???:
    linear_search(mylist, key)

Worst-case analysis of linear search¶

Assume $n \leftarrow$ len(mylist)

In [29]:
def linear_search(mylist, key):        #   cost         number of times run
    for i,v in enumerate(mylist):      #   c1               n
        if v == key:                   #   c2               n
            return i                   #   c3               0
    return -1                          #   c4               1

$ \hbox{Cost(linear-search, } n) = c_1n + c_2n + c_4$

Cost is now just a function of:

  • input size $n$
  • constants $c$ (depend on machine, compiler, etc)

How granular should we get?¶

Consider this slightly different implementation:

In [30]:
def new_linear_search(mylist, key):    #   cost         number of times run
    for i in range(len(mylist)):       #   c5               n
        if mylist[i] == key:           #   c6               n
            return i                   #   c3               0
    return -1                          #   c4               1

$ \begin{align} \hbox{Cost(new-linear-search, } n) = c_5n + c_6n + c_4\\ \hbox{Cost(linear-search, } n) = c_1n + c_2n + c_4 \end{align} $

There is a line between analyzing algorithms and optimizing a specific implementation (e.g., through profiling):

Big Idea¶

  • Ignore machine-dependent constants
  • Focus on growth of running time
    • What happens in the limit as $n \rightarrow \infty$

$ c_1n + c_2n + c_4 \approx c_5n + c_6n + c_4 $

Course Overview¶

  • Analyzing algorithms: methods to compute tight bounds on running time
  • Designing algorithms: various approaches to designing efficient algorithms
    • lists, sequences, trees, graphs,...
  • Distinct from typical courses like this, we will emphasize parallel algorithms from the start

Navigating the course¶

  • Canvas: syllabus, dates, grades
  • Diderot: online textbook
  • Github: assignments, slides

Next time¶

Mathematically define our Asymptotic Notations and introduce tools for performing asymptotic proofs.