Friday, August 17, 2007

Has Google buried the web?

Search engines are a natural consequence of the web. A large amount of pages needs to be catalogued and indexed, and no one does this better than Google. The web is so large that it would take years for any team of humans, however large, to go through and catalogue every single web page. Many directories exist, but the categorization is relatively broad, considering how specialized some pages are. Consider the items under the "misc" category in any multemedia site, and a sense of what is happening will be understood.

The focus shifted from browsing to searching somewhere in the mid nineties. Before that, the url of a web page was all important. Google had a head start there. The famous Google crawling algorithm. It browsed through web pages, following links, and picking out "keywords" and determining "pagerank", both of which are processes little understood by anyone. This is not because their programming is difficult, all search engines use this, but the control of what pages shows up after a search has shifted from humans to computations. Now any computation can only be as good as the initial inputs from the humans, and this is where Google went wrong.

Only recently have there been efforts to understand the meaning behind the words typed into a search box. Most of the time, people are clueless what to search for, and have to go back to the search fields three or four times over before they get a bunch of pages full of something close to what they are looking for. Finding something the first time you search for it is a very rare. The search relevency is low, not because of flaws in the input, but because of flaws in the search engine.

And it is irreparable, as the internet has already been buried by Google.

# The rate at which Google can crawl the net is far behind the rate at which new content gets created

# A large portion of the new content generated is just a bunch of crosslinks, references and directories. Google is not the only middleman trying to make some easy money, it supports a lot of other middlemen - mainly those who use adsense.

# The first few sites that the user visits is a fraction of the relevent content out there on the web. (To be fair, this is slightly compensated by the user who uses a sucession of similiar search terms to find generally the same kind of content.)

# Google has yet to crawl large portions of the internet, there is a lot of content not yet indexed by Google.

There is a lot of web content that is filtered out because of new web pages spawned to meet the demands of the Google algorithm. This is the problem, the Google crawler is itself, impartial, so to speak, but web site designers looking for a quick buck have no problem in confuddling Google to spike their page rank, make their pages show near the first few search results, and generally bury all the information on the web that the user is really looking for. Google really does care a hell lot about this, but their concerns are centered around the algorithm being foolproof, not which particular websites actually show up in the search results. The internet is slowly, but surely, bowing to the demands of a search algorithm that is not under direct control of any human. Google has changed the texture of the web in a very severe way, it is customary for adsense to show up in a side bar somewhere on the page, and if it is not ad sense, then it is some competition to ad sense.

The standards of how a web page is to be presented, with the allocation of space for advertisements and information, has evolved because of the way Google integrates "relevent" business opportunities alongside with web content.

Where this is headed?
A definate step back. Somewhere along the line, people are going to stop in their tracks and say "hey! this is not what we wanted." Open content, when it arrives, will arrive in force. As surprising as it may seem, the real search capabilities of Google is hardly ever used. It is common to head to google, to fill out something in the search box knowing what results you are going to get. Instead of exchanging a url of a web site, a common way to give someone a web address is "go to the third result after searching for so-and-so". Most people who use Google do so only if they already know what they are going to get. Recreational searching is another major passtime, the beggenings of wandering about the web aimlessly, clicking whatever interests one the most at that particular point of time. It is necessary for all the bullshit to be filtered out. The future is in the wiki.

No comments: