All Terrain Thinking

A Compendium of things I think are Important

Earth 5150
"If you teach a man to think he is thinking, he will love you. If you teach a man to think, he will hate you. - Ed McArthur"
 
 

Generally Speaking, Think on this...

 

Everything you ever wanted to know about algorithms

Everything you ever wanted to know about algorithms

Computer scientists look for simplicity, speed and reliability. Sometimes they find elegance.

The word algorithm was derived from the name Al-Khwarizmi, a 9th-century Persian mathematician and author of The Compendious Book on Calculation by Completion and Balancing. But nowadays the word most often applies to a step-by-step procedure for solving a problem with a computer.

An algorithm is like a recipe, with a discrete beginning and end and a prescribed sequence of steps leading unambiguously to some desired result.

But coming up with the right answer at the end of a program is only the minimum requirement. The best algorithms also run fast, are sparing in their use of memory and other computer resources, and are easy to understand and modify. The very best ones are invariably called "elegant," although Al-Khwarizmi may not have used that term for his formulas for solving quadratic equations.

As the mind learns to understand more complicated combinations of ideas, simpler formulae soon reduce their complexity. - Antoine-Nicholas de Condorcet, 1794

An algorithm can be thought of as the link between the programming language and the application. It's the way we tell a Cobol compiler how to generate a payroll system, for example.

Although algorithms can end up as thousands of lines of computer code, they often start as very high-level abstractions, the kind an analyst might hand to a programmer.

For example, a lengthy routine in that payroll system might have started out with this algorithmic specification: "Look up the employee's name in the Employee Table. If it is not there, print the message, 'Invalid employee.' If all other data on the input record is valid, go to the routine that computes net pay from gross pay. Repeat these steps for each employee. Then go to the routine that prints checks." The gross-to-net and check-writing routines would have their own algorithms.

The word algorithm is named after mathematician Al-Khwarizmi.

Reality Intrudes

Of course, it isn't quite that simple. If it were, the study of algorithms would not have become a major branch of computer science and the subject of countless books and doctoral theses.

But it's not hard to imagine computer engineers in the 1950s thinking they had pretty much finished the job. They had invented stored-program electronic computers, and languages like Fortran and Cobol to run on them, and they had largely banished the agony of assembly language programming. In fact, software pioneers such as Grace Hopper saw compilers, and the algorithms that instructed them, as such an advancement -- they could "understand" English -- that they named the first computer to use one the Universal Automatic Computer, or Univac. With adjectives like "universal" and "automatic" in its name, the computer could almost be expected to program itself.

But in the 1960s, computers moved into the business world in a big way, and soon two ugly realities intruded. The first was the matter of "bugs" -- a term coined by Hopper. Computers made lots of mistakes because programmers made lots of mistakes. The second was sorting, a machine-intensive job that came to dominate, and sometimes overwhelm, computing.

Virtually every major application required sorting. For example, if you wanted to eliminate duplicate mailings from your customer master file, which was sorted by customer number, you might have had to re-sort it by last name within ZIP code. Sorting and merging big files often went on repeatedly throughout the day. Even worse, very few of the records being sorted would fit into those tiny memories, and often they were not even on disk; they were on slow, cumbersome magnetic tapes. When the CEO called the data processing shop and asked, "When can I get that special report?" the DP guy might have said it would take 24 hours because of all the sorting that was needed.

So IT people learned that algorithms mattered. The choice of algorithm could have a huge effect on both programmability and proc??essing efficiency.

If algorithms were simple, they could be easily coded, debugged and later modified. Simple ones were less likely to have bugs in the first place, and if you used an existing algorithm rather than inventing your own, some of the debugging had already been done. But simple ones were often not the most efficient. They were not the ones that would speed up sorting enough to give the CEO's request a same-day turnaround.

The Search for Elegance

Computer science took on these challenges and came up with families of algorithms with fancy names like "Induction," "Recursion" and "Divide and Conquer." And programmers developed methods (themselves algorithms) for assessing the efficiency and general goodness of algorithms.

A simple sort algorithm called Bubblesort was developed early on. It involved reading through the file to be sorted, looking successively at pairs of adjacent records. If they were out of order, the two records in the pair were simply swapped. By passing through the file repeatedly and overlapping the pairs, in-sequence records would "bubble up" to the top until eventually the entire file was in sequence. It was easy to understand, program and debug, but it wasn't very efficient because it required many passes through the file.

Descriptions of all the clever algorithms that improved on Bubblesort would fill a book, but many students of algorithms would give the grand-slam award for elegance to Quicksort, which was invented in the early 1960s by Charles Antony Richard Hoare, a British computer scientist. (See box below.)

The Quicksort algorithm

In the early 1960s, British computer scientist Charles Antony Richard Hoare developed the Quicksort algorithm for sorting a list into a sequence. This one is ascending:

  1. Pick any element, called a "pivot," from the list.
  2. Pick in turn every other element, comparing each to the pivot. If the element is less than the pivot, put it above the pivot. If it is greater, put it below. At the end of that pass through the list, the pivot is now in its final correct place.
  3. In the group of elements above the pivot, pick another pivot and repeat Step 2. In the group below the original pivot, pick a third pivot and repeat Step 2. Now three elements have been sorted into their correct positions.
  4. Repeat the entire process, on successively smaller groups, until the whole list is in sequence.

Whereas Bubblesort involves moving records very small distances toward their final resting places, and moving them repeatedly, Quicksort never has to relocate an element once it is in the right place.

Depending on file size and other factors, it can take Quicksort just seconds to sort a file that would take other routines minutes or hours to process. Hoare, who also pioneered methods for proving the correctness of programs, was knighted in 2000 for his achievements.

Another exercise in algorithms is the famous "traveling salesman problem" (TSP), in which a salesman leaves from home to visit a number of cities and wants to minimize his distance traveled. If there are five cities to visit, there are 60 possible routes to choose from, and the obvious algorithm is to compute the distances for all of them and pick the best one. But the TSP suffers from an unfortunate phenomenon known as "combinatorial explosion."

If there are 10 cities to visit, there are 1.8 million paths, and at 20 cities, the poor salesman (or his computer) has 121 quadrillion routes to consider. Clearly, at some point, the try-them-all algorithm becomes impractical.

So mathematicians and computer scientists have come up with all kinds of ingenious ways to "solve" the TSP, which is but one example in a broad class of important problems. Some algorithms give exact solutions for, say, a few hundred cities or less. But with bigger problems, we usually turn to the so-called heuristic algorithms that produce "pretty good" but not optimal solutions. There is a dizzying assortment of such things, including dynamic programming, genetic algorithms and Markov chains.

But if you have 1,000 cities to visit and you don't have a Ph.D. in mathematics, you might try the "nearest neighbor" algorithm, which has proved to be remarkably good in many cases. With this algorithm, at each city, you simply next travel to the nearest unvisited city, and you keep doing that until you have gone to every one.

 

Prior | Tell us what you think | Next

 

Valid HTML 4.01 Transitional
 

Add to Your Social Bookmarks: -

Visitors Map
several several several Site Map - Press Room - Privacy Policy - Disclaimer
Copyright © 1998-2012 eMcArthur unless otherwise indicated
Unauthorized duplication or publication of any materials from this Site is expressly prohibited.
    Hosting by IPower!