Everything you ever wanted to know about algorithms
Everything you ever wanted to know about algorithms
Computer scientists look for simplicity, speed and reliability. Sometimes they find elegance.
The word algorithm was derived from the name Al-Khwarizmi, a 9th-century
Persian mathematician and author of The Compendious Book on Calculation by
Completion and Balancing. But nowadays the word most often applies to a
step-by-step procedure for solving a problem with a computer.
An algorithm is like a recipe, with a discrete beginning and end and a
prescribed sequence of steps leading unambiguously to some desired result.
But coming up with the right answer at the end of a program is only the
minimum requirement. The best algorithms also run fast, are sparing in their use
of memory and other computer resources, and are easy to understand and modify.
The very best ones are invariably called "elegant," although Al-Khwarizmi may
not have used that term for his formulas for solving quadratic equations.
As the mind learns to understand more complicated combinations of ideas, simpler
formulae soon reduce their complexity. - Antoine-Nicholas de Condorcet, 1794
An algorithm can be thought of as the link between the programming language
and the application. It's the way we tell a Cobol compiler how to generate a
payroll system, for example.
Although algorithms can end up as thousands of lines of computer code, they
often start as very high-level abstractions, the kind an analyst might hand to a
programmer.
For example, a lengthy routine in that payroll system might have started out
with this algorithmic specification: "Look up the employee's name in the
Employee Table. If it is not there, print the message, 'Invalid employee.' If
all other data on the input record is valid, go to the routine that computes net
pay from gross pay. Repeat these steps for each employee. Then go to the routine
that prints checks." The gross-to-net and check-writing routines would have
their own algorithms.
The word algorithm is named after mathematician Al-Khwarizmi.
Reality Intrudes
Of course, it isn't quite that simple. If it were, the study of algorithms
would not have become a major branch of computer science and the subject of
countless books and doctoral theses.
But it's not hard to imagine computer engineers in the 1950s thinking they
had pretty much finished the job. They had invented stored-program electronic
computers, and languages like Fortran and Cobol to run on them, and they had
largely banished the agony of assembly language programming. In fact, software
pioneers such as Grace Hopper saw compilers, and the algorithms that instructed them, as such an
advancement -- they could "understand" English -- that they named the first
computer to use one the Universal Automatic Computer, or Univac. With adjectives
like "universal" and "automatic" in its name, the computer could almost be
expected to program itself.
But in the 1960s, computers moved into the business world in a big way, and
soon two ugly realities intruded. The first was the matter of "bugs" -- a term
coined by Hopper. Computers made lots of mistakes because programmers made lots
of mistakes. The second was sorting, a machine-intensive job that came to
dominate, and sometimes overwhelm, computing.
Virtually every major application required sorting. For example, if you
wanted to eliminate duplicate mailings from your customer master file, which was
sorted by customer number, you might have had to re-sort it by last name within
ZIP code. Sorting and merging big files often went on repeatedly throughout the
day. Even worse, very few of the records being sorted would fit into those tiny
memories, and often they were not even on disk; they were on slow, cumbersome
magnetic tapes. When the CEO called the data processing shop and asked, "When
can I get that special report?" the DP guy might have said it would take 24
hours because of all the sorting that was needed.
So IT people learned that algorithms mattered. The choice of algorithm could
have a huge effect on both programmability and proc??essing efficiency.
If algorithms were simple, they could be easily coded, debugged and later
modified. Simple ones were less likely to have bugs in the first place, and if
you used an existing algorithm rather than inventing your own, some of the
debugging had already been done. But simple ones were often not the most
efficient. They were not the ones that would speed up sorting enough to give the
CEO's request a same-day turnaround.
The Search for Elegance
Computer science took on these challenges and came up with families of
algorithms with fancy names like "Induction," "Recursion" and "Divide and
Conquer." And programmers developed methods (themselves algorithms) for
assessing the efficiency and general goodness of algorithms.
A simple sort algorithm called Bubblesort was developed early on. It involved
reading through the file to be sorted, looking successively at pairs of adjacent
records. If they were out of order, the two records in the pair were simply
swapped. By passing through the file repeatedly and overlapping the pairs,
in-sequence records would "bubble up" to the top until eventually the entire
file was in sequence. It was easy to understand, program and debug, but it
wasn't very efficient because it required many passes through the file.
Descriptions of all the clever algorithms that improved on Bubblesort would
fill a book, but many students of algorithms would give the grand-slam award for
elegance to Quicksort, which was invented in the early 1960s by Charles Antony
Richard Hoare, a British computer scientist. (See box below.)
The Quicksort algorithm
In the early 1960s, British computer scientist Charles Antony Richard
Hoare developed the Quicksort algorithm for sorting a list into a sequence. This
one is ascending:
- Pick any element, called a "pivot," from the list.
- Pick in turn every other element, comparing each to the pivot. If the
element is less than the pivot, put it above the pivot. If it is greater, put it
below. At the end of that pass through the list, the pivot is now in its final
correct place.
- In the group of elements above the pivot, pick another pivot and repeat Step
2. In the group below the original pivot, pick a third pivot and repeat Step 2.
Now three elements have been sorted into their correct positions.
- Repeat the entire process, on successively smaller groups, until the whole
list is in sequence.
Whereas Bubblesort involves moving records very small distances toward their
final resting places, and moving them repeatedly, Quicksort never has to
relocate an element once it is in the right place.
Depending on file size and other factors, it can take Quicksort just seconds
to sort a file that would take other routines minutes or hours to process.
Hoare, who also pioneered methods for proving the correctness of programs, was
knighted in 2000 for his achievements.
Another exercise in algorithms is the famous "traveling salesman problem"
(TSP), in which a salesman leaves from home to visit a number of cities and
wants to minimize his distance traveled. If there are five cities to visit,
there are 60 possible routes to choose from, and the obvious algorithm is to
compute the distances for all of them and pick the best one. But the TSP suffers
from an unfortunate phenomenon known as "combinatorial explosion."
If there are 10 cities to visit, there are 1.8 million paths, and at 20
cities, the poor salesman (or his computer) has 121 quadrillion routes to
consider. Clearly, at some point, the try-them-all algorithm becomes
impractical.
So mathematicians and computer scientists have come up with all kinds of
ingenious ways to "solve" the TSP, which is but one example in a broad class of
important problems. Some algorithms give exact solutions for, say, a few hundred
cities or less. But with bigger problems, we usually turn to the so-called
heuristic algorithms that produce "pretty good" but not optimal solutions. There
is a dizzying assortment of such things, including dynamic programming, genetic
algorithms and Markov chains.
But if you have 1,000 cities to visit and you don't have a Ph.D. in
mathematics, you might try the "nearest neighbor" algorithm, which has proved to
be remarkably good in many cases. With this algorithm, at each city, you simply
next travel to the nearest unvisited city, and you keep doing that until you
have gone to every one.