Google’s Secret Ranking Algorithm Exposed

Rankings Explained

Rankings Explained. Any Questions? *click to enlarge*

Last year, some people from the academic community who hadn’t been snatched up yet by Google or Bing did a really interesting study.   Rather than simply researching factor correlations to rankings, as SEOMoz does a great job of doing every so often, they used machine learning techniques to create their own search engine, and trained it to reproduce results similar to Google.  After the training process, they extracted the ranking factors from their trained engine and published them and presented on them at an industry conference.  They were able, for the queries they trained on, to correctly predict 8 of the top 10 Google results roughly 80% of the time.  Not bad, considering Google’s algorithms use  “over 200 variables”, and the study only examined 17 of them – obviously they chose wisely.  I’ve mentioned this in a previous posting, but I think a really thorough runthrough of the study would be informative and interesting.

How the Study was Done

In their paper “How to Improve Your Google Ranking: Myths and Reality“, they detail what the actual weightings were for the various ranking factors.  You could essentially take the values in Figure 4B on page 7 (look at the graph on the upper right – examine the line with the x’s which is the third iteration they converged on – read the values off on the left axis), and construct a regression equation with the weights, i.e.

Rank score = .95 x PageRank + .80 x (# of keyword occurences in hostname) + .58 x (# of keyword occurences in meta-description tag) + ……..

If you were to pull all of the factors from the top 200 SERP results for a particular keyword,  then apply them to this equation to come up with a score for each result, then sorted them by this score, you’d have a shot at reproducing the correct order for the top 10 SERPs.   Doing so would of course be a significant effort, and I am unaware of anyone publishing anything duplicating their results.

This was a revolutionary study, because you can look at the SEOMoz calculated correlation variables all you want, but you can’t really construct a valid regression equation from them, as correlations don’t exactly add (there’s cross-correlation between them, and probably a myriad of other statistical issues with doing so).

Was or is the Study Valid?

There are arguments that this study is inconclusive or only partially useful, since there was a particular set of keywords studied, the study was done pre-Caffeine, and the only off-page factor studied was PageRank.  Yes, SEOMoz recently found that the highest correlated factor for ranking was Facebook likes, certainly things have changed since Caffeine and so on.  However, think about all of this from Google’s perspective.   How much can they really upset the entire apple cart by changing everything?  The web has changed in the last couple of years, but I would argue – not a lot – and even if there have been major changes to Google’s algorithms, and certainly there are many unaccounted-for variables, I am of the opinion that things cannot have changed that much.

Either way, examining the results of this study are very instructive and an interesting thought exercise for understanding how and why SEO works the way it does.

The Ranking Factors the Study Confirmed

Below I’ve reproduced each ranking factor listed in the paper, and have eyeballed the values off of the graph for the weightings.  What’s interesting is not the exact values, but the ordering and also the very nature of the factors they analyzed:

Google Ranking Factors *click to enlarge*

Google Ranking Factors *click to enlarge*

Bounded vs. Unbounded, Linear vs. Logarithmic

Almost all of these variables have bounds to them.  For instance, you can only put the keyword in a title so many times before you “trip a search spam filter”.  The age of a domain is ultimately bounded to whenever the domain name system started, and so on.  There is one variable that is not bounded – PageRank.  It is interesting to note however, that this one is logarithmic – each level requires, on average, 5 times as many links to reach (for more on this, see a previous article I wrote for SearchEngineLand on that topic here).

So, you can get all the PageRank you want, but it’s going to get harder and harder the more you do it, relative to the other variables.  This explains why some of the cheapest things you can do (i.e. highest ROI) are to fix your title, meta-description, H1, and so on, and then get a few links to get the page’s PageRank up to a PR2 or PR3 level.

Surprisingly, this study found value in having outbound links on the page with anchor text that includes the keyword.   I’ve marked this as “bounded” because again, if you have too many outbound links with targeted anchor text, you’re likely to be identified as search spam.

Incoming Anchor Text

The biggest missed opportunity in this study was not looking at keywords in inbound anchor text.  I am postulating in the table that this is unbounded and linear.  I think many of us have seen examples of situations in the SERPs where a PageRank 5 page is being outranked by a PageRank 2 page, the difference being something like 800 incoming links with targeted anchor text.  My belief is that the weighting of this variable is very low (on the order of .05 -.1), but linear – this would explain why anchor text is the be-all and end-all of SEO – it may have a low weighting, but more just plain helps, and your ability to get it is virtually unlimited.  Also you get sort of a double value, in that the anchor text is probably one factor in ranking, and the link itself slightly increases your PageRank factor.

It’s important to note that SEOMoz’s correlation research shows a fairly low correlation of ranking to incoming links with exact anchor text.   But if this corresponds to a weighting that is linear and unbounded, then even a weak correlation, when multiplied by a large enough number of links, could make a huge difference in ranking.  I am of the opinion that a lot more research into this is warranted.

Other unbounded variables that may be useful to Google for ranking purposes include of course, Tweets, Facebook Likes, and (when enough data accumulates but probably not yet) – Google +1′s.

Other Takeaways

Other interesting takeaways from the study – there is such a thing as over-optimization (i.e. keywords in H4 and H5 tags can actually hurt you slightly), keyword density matters (so get on the cluetrain, anti-keyword-density people!), and keyword-rich domain names are extremely important.

Conclusion

The study of course isn’t valid for specialized portions of Google’s search algorithms such as what order YouTube videos sort, or the Local Search component of Universal Search – many of these use other (of the 200+) factors.

However, the study illustrate a few things about SEO overall.  It makes the most sense, from the perspective of the weightings available for each factor, to take care of your easy on-page issues first, then work on building up links (actually, it makes the most sense of all to buy an exact-match domain name first!).    This explains why most people in this field typically take care of issues in that order, and explains the natural logical flow of SEO efforts starting with getting architecture right, then optimizing your content, and finally focusing on linking.  Essentially SEO, like so many other fields,  is all about identifying the work with the highest ROI-to-effort ratio and focusing on that first.

42 Comments

  1. Lumin says:

    This is an amazing write up. I haven’t checked out the academic paper, book look forward to it.

    I’m particularly surprised about what the study says about both domain and page age.

    I would have thought domain age was a bigger factor…and for page age to be negatively factor is surprising as well.

  2. Jim Rudnick says:

    @Ted…while I’m still working my way thru the paper, this blog piece is very well done!

    As an SEO practitioner I was also more than surprised, that this blog piece plus the paper itself has (as yet anyways) NOT been widely promoted by others…seems like coconutheadphones is a spot that not many come to…cept those of us who really “hunt” for great SEO blogs.

    Hat’s off, Ted….great catch here!

    :-)

    Jim

  3. Great article!

    Any chance we’ll get to read your posts on google+ soon?

  4. precom BLOG says:

    looks really interesting.weight of some ranking factors seem to be a bit strange, but i will check out at my own pages.

  5. SEO Analyst says:

    Thanks for sharing this.
    It’s really nice information for SEO peoples.

  6. Erik says:

    @Lumin – Page age as a negative factor can be balanced out by external link freshness.

  7. Hi and thanks for this really amazin post and paper.
    I do appreciate the time and effort you have made with the experiment but have to point out a MAJOR flaw in your experiment.

    The keyword sets you have chosen four categories are Linux commands, chemical elements, as well as music and astronomy terms.are all what we as SEOs call “non-commercial” or low competiive words. In other words, people don’t do SEO for it b/c it usually doesn’t have the potential to make big money as is in financial, health or insurance keywords for example.

    That being said in this case leaving out the whole “link graph” including anchor text distribution didn’t affect the results probably as it would have in other keyword group categories.

    The wrong assumption is also that Google does use “one” weighting for variables – this is the major flaw in SEOoz “ranking factors” as is with this paper. Truth is that based on the theme the ranking factor weighting works differently… heavily linked/SEOed industries work different than low-competition areas where on page factors as listed might make the major difference.

    I do however believe this is awesome material to start from and I would love to see this repeated for keywords like “credit cards”, “home insurance” or “digital camera” to name a few high potential keywords.

    Best regards
    Christoph C. Cemper
    CEMPER.COM

  8. Jason says:

    Great article Ted and I appreciate the additional insights provided by Christoph.

  9. Christoph says:

    really good article. That the anchor text is not so important is new to me and I would not expect. I made the experience, that Keyword rich domains are extremely useful, especially if they have no dashes in it.

  10. Ted Ives says:

    Correction – the “ANCH” factor (anchor text of links pointing externally) as originally published here had a typo weight of “.95″, it should read “.05″. I’ve corrected it.

  11. Frank says:

    Thanks for the great review of the paper. These factors don’t really surprise me, as all this stuff is already mentioned by Google themselves in their SEO-Guidelines (except from facebook likes and twitter) but I am happy to see this confirmed by a third party and that i have done quite a good work with my websites so far.

  12. Michael says:

    Thanks vor this Articles.

    Greetings

  13. Very interesting article. You know, even if the findings aren’t 100 percent accurate, at least it gives you a fairly good idea of what to concentrate on in SEO.

  14. Karen Madson says:

    The timing of this article is amazing, I own a small bookkeeping firm Able Bookkeeping and have been trying for some time to optimize my website http://www.ablebk.com for 1st or 2nd position on Google for the keyword phrase bookkeeping Rogers. Through all of my reading I had come to the conclusion that I would be better off starting over with a new url http://www.bookkeepingrogers.com and putting all of my efforts into that url. I published this new url this morning and hope to have a better ROI like you described in your article. I wish I had read this article 6 months ago. (Great Article)

  15. Great Piece Ted. Working through the white paper now. Fantastic blog all around. Thank you !

  16. Thank you so much for this easy to understand, SEO best practice piece and the link to the paper of which you have adapted. Great blog

  17. Hape Etzold says:

    Thanks a lot. Just was looking onto the backlinks of a client with Google Ranking <30. Although the client has more and better backlinks he is outranked by a competitor with poor backlinking (top 10), but main keyword combination in the domain name.

  18. Always nice to read us blog posts on that kind of topic, it is a pretty reliable indicator of what will happen in germany, about 1 year after the respective article was published ;-)

  19. Ted, thank you for sharing this great information with us. I think as you have mentioned, Search engines change their algos all the time to confuse se optimizatiors, but if you follow the simple well established, white hat seo methods, you will be fine. Simple optimization, content, links!

  20. Masood says:

    Hi, Ted
    Thanks for sharing this interesting and thorough study. keep writing good thoughts.

  21. Barry Adams says:

    It appears a good study on the surface of it, but the results do raise a few eyebrows. Aside from the keyword density factor (I’m sceptical as I’m a kw-density-naysayer) it lists meta description as a rather significant ranking factor. This is, of course, entirely false. Also the great emphasis placed on PageRank seems to collide with what we know about how (TB)PageRank actually works in Google’s rankings…

  22. Ted Ives says:

    Barry, I think you’re right that meta-description is not a direct ranking factor, but I believe that it’s correlated to your CTR relative to your competition, in that a good meta-description increases your CTR. Some (myself included) believe Google is using relative CTR as a ranking factor:

    http://www.coconutheadphones.com/does-google-use-click-through-rate-as-an-organic-ranking-factor-answer-maybe/

  23. Barry Adams says:

    Ted, hmmm, yeah, you could be right on that. Meta description – when Google bothers to use the one you’ve supplied instead of something else it thinks is more relevant – is definitely a CTR improvement opportunity, and if indeed the big G uses CTR as a factor (which it does in personalised search – I’m not so sure in unpersonalised results [but then, we see less & less of those]) then yes I suppose you could see meta description as an indirect ranking factor. Still, .50 seems a heavy value to me…. but then that might be consistent with the pervasive personalisation of SERPs.

  24. Ted Ives says:

    Henry, I keep seeing comments like yours here, and I’m having a hard time understanding them – how can people state that PageRank, or keyword density (a previous commenter), or (pick your favorite factor) are not ranking factors as such?

    No one can, unless they’re either from Google and wrote the algorithm and know, or unless they run a massive correlation study and come up with some kind of evidence one way or the other. The study referenced above is very strong evidence..

    True, correlation does not imply causality so we should be wary, but what better evidence can we have than someone reverse engineering SERPs, training a machine learning program using variables pulled from the pages, checking it against test data, then pulling the variables weights out of their machine learning system? The evidence is in the numbers – the study is really strong evidence that PR matters, as is SEOMoz’s yearly correlation studies.

    Not trying to slam you in particular Henry, I just keep seeing these types of comments and I don’t understand why people have such a hard time believing numbers out of a pretty scientific study. Is it because of what they’re hearing around the industry? Caveat Emptor in that case. Google even says they still use PageRank, right on their website under “Technology Overview”.

    Reminds me of this story!
    http://www.biblegateway.com/passage/?search=Luke+16%3A19-31&version=NKJV

  25. BRAVI says:

    Very interesting perspective – particularly on the usage of anchor text. Obviously low hanging fruit of on site optimization needs to be handled first. Good stuff!

  26. Karen Madson says:

    This is a follow up to a post I made in Oct, 2011. The new site I published with the keyword in the url has in fact done very well with only 2 backlinks. I just published another site http://www.eureka–springs.com to not only promote but also check the validity of keyword in url, title, description, and h1. I’ll check back in a couple of months and let you know. Thanks again for this site it is a great resource.

  27. Nick says:

    Great post, nice little experiment.

    But Christoph is right, the keyword are non-commercial and low competitive. I think you should repeat this experiment for keywords like “buy viagra” or “payday loans”. You’ll get very different results.

  28. The title for this post completely sucked me in when I came across this!!! I thought I’d found something new…..several years late to the party.

    These ranking methods are very outdated now.

  29. Rakesh Desai says:

    Very interesting and I completely roger with your PageRank stuff. Thanks for sharing, it make me clear about my thoughts for whole ranking strategy.

  30. Darren Walsh says:

    Really great article on how Google looks at your site. There are a couple of things of interest that I will have to take note!

    Many Thanks

  31. When someone writes an piece of writing he/she retains the image of a user in his/her mind that how a user can understand it.

    So that’s why this article is outstdanding.

    Thanks!

  32. Roxanne says:

    It’s nearly impossible to find educated people in this
    particular topic, but you sound like you know what you’re talking about!
    Thanks

Leave a Reply

Pingbacks & Trackbacks

  1. Google’s Ranking Algorithm Exposed (Sort-of) | Joe Evans' Internet Blog in St. George, Utah - Pingback on 2011/07/20
  2. Studenten bauen Google-Algo nach - Pingback on 2011/07/24
  3. Suchalgorithmus: Studenten sind Googles Geheimnissen auf der Spur » t3n News - Pingback on 2011/07/25
  4. Google Algorithmus von Studenten sehr einfach nachgebaut » Selbständig und Geld verdienen - Pingback on 2011/07/25
  5. The Search vol 7: Top Online Marketing Articles for the Week | Three Deep Marketing - St. Paul, MN - Pingback on 2011/07/26
  6. ความลับการจัดอันดับ Google ถูกเปิดเผย!!! | Thai SEO News - Pingback on 2011/07/30
  7. Are "Meta Descriptions" still relevant? - Pingback on 2011/08/09
  8. New_Game_Releases - Google Plus Secrets Exposed in Detail How to profit from it - new_video_games - Pingback on 2011/10/28
  9. Evercontact's blog | How to rank higher on the Google Apps Marketplace - Pingback on 2014/06/25