New technology cuts through Internet jungle

by Dave Watson on August 8th, 2007 at 9:59 AM

1 of 1 2 of 1

Every so often, a technology comes along that has a subtle impact on our daily lives. It doesn't have to be a big before-and-after invention like the light bulb or the telephone. It can be just a little blessed thing like the rewind feature on a VCR (and its refinement on digital video machines–a buffer of just-watched live TV). This is the sort of thing that insinuates itself more subtly into our media-consuming lives, because it's a feature so useful it should have been there all along.

It's the same with the Internet and search engines. The Internet is the big society-changing thing, while searching has gone from a small convenience to an absolute essential for locating and managing what you want from all the information out there. Without search capability, the Internet you could experience would be strictly limited to the sites you already knew of, plus any links they might contain.

Yet searchability is only about a decade old, half the age of the Internet as we know it. Prior to that, trying to find a specific chunk of information was an extremely inefficient task that had to be conducted like real research, not a quick query and some clicking–one reference in an on-line document leading to another, with a lucky break in the third (or 30th) document that could get you to what you wanted–assuming what you hoped to find even existed, that is, since the number of people contributing content to the Internet was comparatively small. Plus, in the days before the Web browser (1993 and earlier), you needed to have a handful of different pieces of software installed to find, transfer, or decompress each file, as well as understand how to operate other software that lived on the machine that held the documents you wanted. (Anyone remember Archie, Veronica, and Jughead, the minuscule big three of pre-Web on-line–search software?)

One of the most appreciated features of early browsing software was that it was built to handle a number of the awkward tasks involved in information retrieval. It also meant people started publishing their research and entertainment links as clickable on-line lists and then publicizing them. All you had to do was hang on to the text file announcing that collection's site launch (well, all sites, since you never knew what sites you'd need someday), or regularly scour the list of new sites, and you had fast access to a growing collection of documented human knowledge.

That's how Yahoo! started in 1994: as a collection of selective annotated links grouped by subject as in a library. You could search the index as well, but it almost seemed put there as an afterthought. Of course, all that thoughtful categorizing and site-quality evaluation inevitably gave way to the raw power of search engines as the Internet became popular in the mid 1990s. Plus, I never met an old-school computer geek who didn't love a good database, which is why literally hundreds of search engines were built in a very short amount of time, each with only a portion of the Internet logged within it. In many cases that was intentional, because most were specialized search engines devoted to some narrow segment of specialized academic knowledge.

A few dozen sites attempted to be more comprehensive and all-encompassing, but until Google gained dominance following its 1998 debut, the best ones you could find were sites such as Dogpile and MetaCrawler, which searched a few of the biggest search engines at once and presented the composite results to you. Those worked pretty well, though you wondered what glorious gems of information were out there unindexed and inaccessible.

Not finding certain information is a lot less of a worry in these modern times. Pieces of software called spiders and bots travel the Internet, indexing pages and following links and sending the data back to feed the ever-growing databases they slavishly serve. Besides, there are so many pages out there that you don't need to search them all to get what you need. These days, the phrase "comprehensive research project" means you went eight or 10 pages deep in the Google results, just to make sure you had everything covered.

But the Internet might be outgrowing search engines once again. Part of the reason–which I'll get into next week–is because computers process concepts in a linear and literal way, while humans are more intuitive and able to act on partial information (which isn't always a good thing). Hell, it's practically revolutionary that you can get usable results from Google or the Internet Movie Database even when you misspell a word in the search, much less have the search engines suss out the nuances of the retrieved content.

Still, a more advanced approach is being worked on. Some have already called it Web 3.0, which is extremely annoying to those of us who aren't even sure how much we like Web 2.0 yet, with its Facebook, MySpace, and Second Life approaches to community. I just hope Web 3.0 doesn't mean we have to install a bunch of software in order to fly a cartoon representation of ourselves through a virtual card catalogue. That could get tiresome.