The code examples are referred to in the slides. Here are sample
executions. All examples are derived by running Python 3.6 on Linux.
Start in the current directory and change directory (cd) into the different
subdirectories to execute the code as you see below.
Note that the run-time and performance numbers in this README file will not
exactly match the empirical numbers in the slides because, as you should
know, your mileage may vary when running this code on different operating
systems and machines.
Algorithm Formalities
The first section covers at a high level the mathematical formalisms used
to measure the time complexity and space complexity of various
algorithms. This is a rather informal presentation of fundamental concepts
that have been extensively studied for decades. For the most part, the Big
O notation I present characterizes the performance or space expectations in
the worst case.
This table empirically evaluates code to "Check if List Contains a Target
Integer."
Contains use the python in operator
SortContains uses a linear sort over a sorted list
BinaryArraySearch over a sorted list
Performance Comparison
$ cd "2. Basic Data Structures"
$ python3 queueWordLadder.py
['COLD', 'CORD', 'CARD', 'WARD', 'WARM']
List BASearch Dictionary
412.751261 14.179721 2.790587
This code uses a stand-along queue to compute a Word Ladder between
'COLD' and 'WARM'. When determining whether a four-letter word is a valid
English word, there are three possibile approaches:
Use a Python list to store all words and check using in
Use a Sorted List and use BinaryArraySearch
Use a Python dictionary
Word Ladder Summary
$ cd "2. Basic Data Structures"
$ python3 queueTimingThreeWordLadder.py
Queue List BASearch Dictionary
DQ 405.722967 14.274099 2.756425
List 407.715855 14.827824 3.329750
Q 409.225666 17.220676 5.669427
This code choose three different data structures to store the queue. The
impact is marginal. When considering the fastest option for checking
whether a four-letter word is a valid English word (Dictionary) the three
different implementations show how the deque offers the best
implementation. The list implementation is next, and the
queue implementation a distant third, because of the overhead from
supporting thread-safe operations.
The first two lines of output briefly confirm InsertionSort and MergeSort
are working. The table compares the performance of the three sorting
algorithms when sorting a list formed by concatenating (1) N sorted integers
from 0 to N-1; with (2) N/4 sorted integers randomly samples from the range
(0, N/4).
Note that InsertionSort is not run for lists larger than 2048 elements
because of the quadratic nature of its time complexity. Observe how its
performance quadruples when the problem size doubles.
First it solves Task #2, finding 60 words that belong to no word ladder.
I'm glad to see that it is impossible to form a word ladder from 'GOOD' to
'EVIL'.
Second, it solves Task #4, finding disjoint subsets of non-connectable
words. Of the 5,875 four-letter words, it is possible to form a word ladder
between any two of 5,807 words. There are three smaller "islands" that are
formed by smaller subsets, as you can see.
It then solves the 'COLD' -> 'WARM' Word Ladder using a BreadthFirstSearch,
which results in a Word Ladder of 5 words. Using DepthFirstSearch, the
resulting word ladder has 392 words.
Using the NetworkX built-in support for Dijkstra's All Pairs Shortest Path
algorithm, we can discover the longest Word Ladder in this dictionary
contains 17 words.
Using my native Python implementation of Floyd-Warshall, a different
algorithm to solve the same problem, the same result was computed in just
about eight hours. One reason for the timing difference is that Dijkstra's
All Pairs Shortest Path has time complexity of O(E log V) while
Floyd-Warshall has time complexity of O(V^3). The behavior difference
results from the total number of edges. In the worst case, there is an edge
between every pair of nodes, for a total of n*(n-1)/2 edges which is on the
order of n^2. In sparse graphs, the number of edges is much lower, on the
order of n. Dijkstra's All Pairs Shortest Path will outperform
Floyd-Warshall on sparse graphs.
This particular graph has 5,875 nodes and 37,362 edges. There could be a
total of 17,254,875 edges, but only 0.22% of these actually exist, thus this
is a fine example of a sparse graph.
The first table below reports the average search time for an element in a
Skip List or AVL tree containing N random integers (sometimes the element
is in the structure, sometimes it is not).
请发表评论