Social Media Network

Peter Norvig deems PageRank "over-hyped"

Google's Director of Research Peter Norvig says that the search engine's hallowed PageRank link-analysis algorithm that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set, is overrated. And apparently, it has always been.
"One thing that I think is still over-hyped is PageRank,"
Norvig said last week during a question and answer keynote at the search-obsessed SMX West conference in Santa Clara, California.
"People think we just do this computation on the web graph and order all the pages and that's it. That computation is important, but it's just one thing that we do.

People [webmasters and SEOs] always said, 'We're stuck if we don't have [a high PageRank].' But we never felt that way. We never felt that it was such a big factor."

Google describes PageRank as:
“PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important".”


Late last year, Google added so-called "real-time search" to its engine - serving up links to fresh Tweets, news, and other recent web posts. When asked if this was a far more difficult undertaking, considering that PageRank doesn't work well with Web2.0rhea. Norvig was quick to say "no," explaining that even with core search, PageRank isn't as important as people think it is.
"[PageRank] has a catchy name and the name recognition. But we've always looked at all the things that are available [when ranking search results]. We look at where do things come from, what are the words used, how do they interact with each other, how do people interact with them,"
"[Real-time search] is more similar [to core search] than dissimilar, in that you're grabbing every available signal and trying to figure out the best way to combine them. The fact that there aren't legacy links from a long time ago - we don't think of that as that much different."

The key to real-time search, Norvig said, is Google's famously distributed back-end infrastructure, which is able to re-build its web index with relatively little delay. When Norvig first joined the company, the Google web index was built once a month. Then the company moved to once a day and then to once an hour. Now, its distributed infrastructure - using proprietary technologies like the Google File System and MapReduce - can update its index in "10 seconds," according to Norvig.

When the hourly index was rolled out, Norvig remembered, Larry Page insisted on calling it the "3600 second" index.
"If it was hourly, it was just going to stay like that," Norvig said. "But if you talk about it in seconds, people are going to push it down to 1000 seconds and eventually you get it down to 10. And that's where we are now. His vision has come true."