Algorithmic Efficiency
In
calculus, we learned about the hierarchy of which types of functions grow more
quickly than others.
… ln(n) ≺ n ≺ n ln(n) ≺ n2 ≺ n3 ≺… ≺ en ≺ n! ≺ nn …
In computer
science, when we analyze a function’s efficiency, we consider the amount of
resources it uses, for example, the number of times a loop would iterate, every
time it is used on a problem of size n,
in the worst-case scenario, i.e. for very large n. We find a function that approximates how the amount of resources
used, and thus the run time grows with the value of n, and represent it, in general terms, using “Big O” notation.
These may not hold for smaller values of n.
Name
|
Notation
|
Example
|
constant
|
O(1)
|
def f(x):
return ‘taadaa!’
|
logarithmic
|
O(log n)
|
binary
search on a sorted list
|
linear
|
O(n)
|
def f(L):
for item in L:
return ‘hello!’
|
“linearithmic”/
“loglinear”/ “quasilinear
|
O(n log
n)”
|
merge sort,
quick sort, and usually timsort
|
quadratic
|
O(n2)
|
bubble
sort, selection sort, insertion sort
|
Similarly,
these kinds of functions have a hierarchy, from constant, to quadratic, and so
on. The lower down the efficiency of a given function is in the hierarchy, the
more slowly it grows as n grows, and the more efficient it is.
Sorting Algorithms
Sorting
algorithms, as we know them, sort Python lists, arbitrarily in either ascending
or descending order (usually ascending). Back in CSC 108 we learned, and in CSC
148 we have revisited the following three sorting algorithms: bubble sort,
selection sort, and insertion sort.
Bubble sort makes multiple passes through a list,
comparing adjacent items, and exchanging those pairs of adjacent numbers that
are out of order relative to each other. At the end of each pass, another of
the largest items has been “Bubbled” up to its final location.
def
bubble_sort(L):
for passnum in range(len(L)-1, 0, -1):
for i in range(passnum):
if
L[i] > L[i+1]:
L[i], L[i+1] = L[i+1], L[i]
In a list
of length n, the first pass makes (n-1) comparisons, the second, (n-2), the third, (n-3), and so on, until the last (or the (n-1)th pass, where only 1 comparison is made. Thus, a
total of (n(n-1))/2 = (1/2)n2 – (1/2)n comparisons are made. This is
a quadratic function, so we say that bubble sort’s “problem complexity” is
quadratic, or O(n2).
Selection sort sorts lists by finding the largest (or
smallest, depending on whether the list is to be in ascending or descending
order) item in the yet unsorted section of the list, and swaps it with the item
that is currently occupying its final location.
Notice that it only makes one exchange per pass.
def
selection_sort(L):
# find index of largest unsorted item…
for fillslot in range(len(L)-1, 0, -1):
ind_of_max = 0
for I in range (1, fillslot+1):
if L[i] > L[ind_of_max]:
ind_of_max = i
# swap…
L[i], L[fillslot] = L[fillslot], L[i]
Selection
sort also has a problem complexity of O(n2), but tends to execute
faster because it does not swap pairs of values as often.
Insertion sort sorts lists by creating a sub-list at one end
and, with each pass, adding an item to this section in such a way that it is
sorted relative to the other items in this sorted section.
def
insertion_sort(L):
for i in range(1, len(L)):
# save value to be inserted…
temp = L[i]
# make space in sorted section by
shifting…
j = i
while j != I and L[j-1] > temp:
L[j] = L[j-1]
j -= 1
# insert value…
L[j] = temp
New to CSC
148, we also learned about shell sort, merge sort, quick sort, and timsort.
These are all more efficient than the previous three.
Shell sort sorts lists by creating sub-lists of items
that are i items apart (this is sometimes called the “gap”) and sorts these
sub-lists. It repeats this process until i has been decreased to 1. Finally, it
performs a single insertion sort. It requires a helper function, called gap
insertion sort, which creates and sorts the sub-list for each, decreasing value
of i.
def
shell_sort(L):
sublist_count = len(L) // 2
while sublist_count > 0:
for start in range(sublist_count):
gap_insertion_sort(L, start,
sublist_count)
sublist_count = sublist_count // 2
def gap_insertion_sort(L,
start, gap):
for I in range (start+gap, len(L), gap):
current_val = L[i]
pos = i
while pos >= gap and L[pos-gap] >
current_val:
L[pos] = L[pos-gap]
pos -= gap
L[pos] = current_val
Shell sort
is more efficient than insertion sort alone because, by the time all the
incremented sub-lists have been sorted, insertion sort has very few comparisons
left to make. It usually has a problem complexity that is between O(n) and O(n2),
for example, O(n3/2), depending on the list.
Merge sort is the first of our sort algorithms to use
recursion. It continually splits a list in half, and then invokes merge sort on
both halves until all sub-lists are either of length 1 or 0, and thus sorted by
definition. It then merges all the sub-lists into a single, sorted list; using
the helper function, merge.
def
merge_sort(L):
if len(L) < 2:
return L
else:
left_sub = L[ : len(L) // 2]
right_sub = L[len(L) // 2 : ]
return merge(merge_sort(left_sub),
merge_sort(right_sub))
def
merge(L1, L2):
L = []
L1. reverse()
L2.reverse()
while L1 and L2:
if L1[-1] < L2[-1]:
L.append(L1.pop())
L1.reverse()
L2.reverse()
return L + L1 + L2
Merge sort
divides the list in half log(n) times
and merges n times. It, therefore, has a problem complexity of O(n log
n).
Quick sort takes some arbitrary pivot, for example, the
first item in the list, and moves all the items less than it to the left; all
the items greater, to the right. Note that there are ways in which pivots can
be strategically chosen to maximize efficiency. Quick sort is then invoked
recursively on the two sub-lists.
def
quick(L):
if len(L) < 2:
return L[ : ]
else:
# i = randint(0, len(L)-1)
i = 0
return(quick([x for x in L if x <
L[i]] + L[i] + \
quick([x for x in (L[0:i] + L[i+1:
]) if x >= L[i]]))
Quick sort,
like merge sort, also has efficiency O(n log n), but it does not require the
additional memory that merge sort does, and can therefore run faster.
Our final,
most complicated, and most efficient sorting algorithm is timsort. It is called “timsort” because Tim Peters invented it, in
2002. This is the algorithm Python uses when we call the built-in function,
sorted, on a list. Timsort is a “hybrid” sorting sorting algorithm, that uses
both merge sort and insertion sort, designed to do what is most efficient for
the specific list it is given. It takes advantage subsets of the given list
that are already ordered, called “runs,” and uses these to sort more
efficiently. Both on average, and in a worst-case scenario, it performs with
O(n log n) efficiency.
[1] There are far more different levels of efficiency of
functions, but these are a few of the most common or that we’ve most often
dealt with in this course.