Estimating Organic Search Opportunity: Part 1 of 2
As an SEO practicioner, you will often find yourself in the position of having to estimate traffic potential. If you’re like me, you’ve probably used the AdWords Keyword Research Tool to do this. Most people do this with “Exact Match” turned on and use the “Local Searches” number, then make assumptions about what position they might be able to obtain on average, then they assume an average click-through rate based on various studies that have been done on click-through rate versus position. For individual keywords, the click-through rate in reality will vary widely from the average, but if you’re doing estimates for hundreds or thousands of keywords this should “all come out in the wash”.
I’ve done this exercise numerous times but have always had the sense that I’m probably overestimating, because the traffic amounts predicted didn’t materialize, even when positions were achieved.
These two postings will identify the reasons for this, and we’ll construct tables that can be used to easily estimate organic opportunity based on position.
The Three Problems
1.) Search position studies vary widely, and are somewhat confusing.
2.) Some impressions will go to paid search and aren’t really available.
3.) Some impressions result in ZERO clicks (!) and aren’t really available either.
1.) Search position studies vary widely, and are somewhat confusing.
There have been a number of studies on average Click-Through-Rates in search engine results pages (SERPs). The most famous is probably the informal studies that individuals made of AOL’s search engine data that was released into the wild in 2006. AOL released anonymized (so they thought) data, as a courtesy to the academic world in order to foster more research in search. Once the data was out, some folks quickly realized that some users had searched on personal information such as their own social security number or address, and then had performed various embarrasing searches. I believe this was before AOL began contracting with Google to provide their search results, so it isn’t necessarily perfectly applicable to Google search results, but the data, for a long time, was the most detailed available.
There have been a myriad of other studies on CTR at various positions, and a cursory look shows a wide variety of results – for position 1, for instance, ranging from 42% of clicks to 16.9% of clicks. Here are the various studies followed by a graph of their results:
Neil Walker: http://www.seomad.com/SEOBlog/google-organic-click-through-rate-ctr.html
There are a number of reasons why these studies vary so widely, including exact match vs. broad match, results including universal (blended) search results versus non-blended, and click-weighted average data versus median data. I took a look at the various studies and tried to determine which ones, and even which subsets of data, are comparable, and selected four datasets to use:
1. Slingshot’s “non-blended” data (since most of the other studies don’t really address Universal Search). There is an anomaly of a small bump in position 6 which makes me suspect the dataset size isn’t large enough, but the curve overall looks correct.
2. The Enquiro data
3. Optify’s data – but instead of the widely publicized *very high* click-weighted average numbers , we’ll use the “median” data.
4. Neil Walker’s data – but since he notes the percentages add to more than 100%, I have scaled all numbers proportionately down. As a scaling factor, I selected one which makes all the non-page-1 SERP traffic add up to the average that the other three curves had, which is about 38%.
Below you can see that the datasets selected are much more tightly grouped and in agreement, and we’ve calculated the average click-through-rate by position:
Now, don’t get excited and all hung up on these average values; we are going to correct them slightly later. But they give us a great basis to start fitting a curve to the data.
Fitting a Curve
Search results, like many things, follow a power law distribution. There has been a lot of research on how and why people click on various positions; the two major models the academic community has come up with are the “positional” and the “cascade” models. The “positional” model assumes that people are looking at various positions and homing directly in on what they want to click on; the “cascade” model assumes people are scanning through each entry, with a set probability of moving on to each next entry. Both models come up with very similar results and can be approximated by a simple power law curve. Doing a power curve fit to our data will allow us to extrapolate it beyond the first page of the SERPs, and we can then use that curve later in making organic estimates for a variety of positions greater than 10.
By adding a trendline into Excel (if you’re curious, it was Y=20*X^(-0.195)), we get a trendline with a *great* fit, that has a greater than 99% R-squared value. In fact, I like this curve better than the average we established because it gives a better value for position 5 which was probably skewed by that one suspicious Slingshot data point, so we will actually use the values generated by this curve itself rather than the average values going forward.
I then made a few corrections to the curve so that it would satisfy two constraints that aren’t conveyed in the original data:
1.) It’s been well established that moving from position 11 to 10, 21 to 20, and so on (in other words, bumping from the top of one page to the bottom of the next) results in roughly twice the bump that a one-position increase at that point in the curve would imply. The AOL data showed this, as well as the more recent Brandsoftech and Chitika studies.
2.) Totally an arbitrary constraint; I wanted the traffic beyond position 100 to be no more than 5% of the available clicks. This is because in the real world, people using position data to make calculations subscribe to various services that check SERPs for them, and at most collect only data up to position 200. Let’s face it, if you are in position 101 and you’re getting some data for a particular keyword, that data is worthless – it’s too small to figure out what traffic you’d be getting in position 15 if you moved up, and you’re fooling yourself if you think you can use it for that. 100 positions is probably pushing it actually, but since works out to 95% of traffic, that seems like a reasonable place to end our table (since it’s a power law fit, it could essentially go on forever, although with ever-tinier numbers). I may be sacrificing accuracy and correctness for usefulness here; if others have any ideas on this front I welcome them.
The final curve, with corrections, is presented here in figure 3:
In part II, we will cover the other two problems in generating organic traffic estimates; impressions that aren’t available because they go to paid search, and impressions that aren’t available because they have zero clicks (these are called “abandoned” searches, and are very notable in that they are little-talked-about, particularly by Google). Then we’ll construct tables for positions 1-100 that can be used to estimate traffic potential levels, both for new keywords, and for existing keywords that you have a track record of ranking for already.