Google says that in the year 1998, it had estimated that there were over 26 million pages on the Internet. Two years later that number hit the 1 billion mark and now in 2008 it hit the 1 trillion mark. Now that’s big. Oops I mean huge. :-) Though Google admits it doesn’t know the exact amount of just how many pages are out there in cyberspace, it can make an accurate estimate of the number.
So what does this mean? Simple. The web is still taking off and there seems to be no end in site. On the Google blog they state:
How do we find all those pages? We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links. In fact, we found even more than 1 trillion individual links, but not all of them lead to unique web pages. Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other. Even after removing those exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day.
So how many unique pages does the web really contain? We don’t know; we don’t have time to look at them all! :-) Strictly speaking, the number of pages out there is infinite — for example, web calendars may have a “next day” link, and we could follow that link forever, each time finding a “new” page. We’re not doing that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what’s a useful page, and there is no exact answer.
We don’t index every one of those trillion pages — many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn’t very useful to searchers. But we’re proud to have the most comprehensive index of any search engine, and our goal always has been to index all the world’s data.
One wonders what will be the next technology breakthrough after the Internet. Any guesses?