By Branko Rihtman, Senior SEO Analyst
With the death of Yahoo Site Explorer, online marketers around the world have started looking for replacements for this crucial service. There are several link data providers in the market and the number seems to be climbing constantly. Yet, there has been little or no independent research to assess the quality and the comprehensiveness of the data that these vendors offer.
When looking at backlink data, different parameters are important for different purposes. There are people that do not want to look at all the data due to the sheer size of some of the reports that some vendors offer. What is important to remember is that when one decides to look at partial data, they are automatically subscribing to the decision-making process involved in choosing which links to show and which to leave out of the report. This process has to be based on a set of rules deciding which links are important and which are not. These rules are usually arbitrary and are based on a fair amount of assumptions regarding how search engines decide which links should get more weight and which should be discounted. While these assumptions may help you waft through large amounts of data, they may have absolutely nothing to do with how search engines look at the link data nor do they help you get what you need from the backlink reports. Therefore, here at Rankabove, we always prefer looking at as much data as possible, while performing our own analysis based on the most comprehensive source of data.
In order to establish which of the current large backlink data providers allows to analyse the largest sample of backlink data, we chose three different URLs from sites we work on and analysed them through the most popular back link data providers on the market today: OSE (by SEOmoz), MajesticSEO (both Fresh and Historic indices), Ahrefs and Link Research Tools. First, here is a little bit of background on each of the vendors we analysed:
OSE (SEOMoz): there is no one out there in the world of SEO that is not familiar both with the SEOMoz blog and the set of their tools. Their backlink data analyser Open Site Explorer is powered by their Linkscape Web index. It is built by crawling the web, starting by the seed list of trusted sites and branching out from there. In addition to backlink index, it offers some of its own proprietary analysis, assigning quality, trust and authority scores to link. According to SEOMoz, its index includes 9.2 trillion links and 0.36 trillion URLs. With the advanced reporting feature, you can download up to 100,000 links per report.
MajesticSEO: Majestic is the first link data vendor that came to SEO market. The need for a larger dataset existed even when YSE was still around – Yahoo allowed only 1000 links per query and even though one could have increased this number through advanced queries, there was still a large gap in information available to webmasters. This gap was filled by MajesticSEO, which provided access to all the data in its backlink index. As a matter of fact, Majestic has two indices: Historic index which contains all of their data from June 2006 till November 2011 and which gets updated every 30 days and the Fresh Index which includes all the links found in the last 30 days and gets committed to the Historic index every 30 days. According to MajesticSEO site, the Historic Index includes 357.6 trillion pages crawled and 3.6 quadrillion links, while the Fresh index includes 20.2 trillion pages and 108.3 trillion links. Their reports allow you to download all the links although the number of links and reports you are able to see depends on the package you purchase
Ahrefs: this is a relatively new member in the link data crowd. They claim to have their own bots that crawl the web and gather backlink data. According to their site, their index has 5.9 billion unique URLs and 13 billion unique backlinks. However, there is a limit to the number of links you can export and it depends on your package – basic package will allow you 2500 backlinks per query, while the most expensive package will allow 20,000 backlinks per query.
(Update: According to the Ahrefs CEO commenting in SEO Book Forum (subscription only), they do have a capability to access more raw data, with free account being able to download up to 10K links and the most expensive account (Enterprise – $499 per month) up to 200M links. Unfortunately, access to this data was not apparent at the time we did the research, so Ahrefs data is based on stated original sample sizes.)
Link Research Tools – while this service does not crawl the web by themselves, they are more of a “meta link provider” – they compile data from several other available sources (among them SEOMoz, MajesticSEO, SEMRush, etc.) and de-duplicate it and check whether links are live. As we will see later in this post, link decay is a big problem which requires a lot of resources so having an external service do this for you can be a big plus. However, they allow only up to 2,000 links per report in their Quality Backlink Report, with the possibility to use something called Link Boost within a different tool called Backlink Profiler and get up to 12,000.
So with all the central players presented, we started checking the issues we felt were important to assess as to which of the providers has the best data.
We checked several things:
1. The number of provided links for a requested URL
2. The percentage of decayed links – how many links were dead at the time of the report creation
3. The percentage of unique links – how many links appear in a specific provider’s report, measured against each of the other providers.
These three parameters were the most crucial for us when choosing an external backlink provider. They represent a transection of the need to look at the most comprehensive index with the wish not to waste resources on analysing link data for links that are not live. Additionally, we did not want to judge each provider based solely on these two needs and checked for the possibility that a certain provider -while providing a smaller sample of links – does a good job in showing a unique sample of links that is not available through other providers.
Number of links
As mentioned earlier, we looked at three different URLs from sites in three completely different industries. One of the URLs is a homepage and the other two are internal pages.
Table A reports the numbers of links for each of the URLs:
As can be seen, there are huge differences in the number of links reported for each URL. However, this data is useless without knowing how many of the reported links are actually live vs. links that were removed or linking sites that do not exist anymore. For that purpose, we developed specialized tools that check what is the server response of the linking page, whether link from a provided list exists and what the anchor is. This way we were able to establish the link decay for each of the URLs in every provider report
Listed in Table B are the percentages of decayed links in each of the reports we created:
As can be seen, Majestic Historic index has enormous percentages of link decay. This is rather strange, as one would suspect that with the ability to remove the deleted links from the report, there would be less missing links reported. Another thing that jumps out is that percentage of link decay can vary from site to site (and by that virtue, from industry to industry) quite a bit. Notice the difference between the decay percentage of URL3 and URL1/2 in OSE. A further issues thing worth mentioning is that Link Research Tools report the missing links as ‘missing’, unlike other tools that report them as ‘live’.
Table C shows what the backlink report looks like after accounting for the link decay.
Yellow fields mark the fact that the calculation was done through sampling. Even with link decay calculated in, Majestic Historic index outperforms all the other indices by far. Ahrefs comes close (not in all cases though, example URL1) and it would be interesting to check how much those two indices overlap.
The next thing we wanted to analyse is what percentage of unique links each report includes. We checked only URL1 and URL3, since URL2 had a huge number of links and a lot of the reports only represented a sample of all of them.
Table D shows the figures on the number of unique links each backlink report included:
And here is the graphical representation of the above data:
And in Table E is the data for the URL3:
And here is the graphical representation of the above data:
Notice that in both cases, the average figures for the Majestic Fresh Index are skewed towards lower end due to high overlap with the Majestic Historic Index (only and 6.29% unique links for URL1 and URL3 respectively). If we discount the overlap with Historic Index, the Majestic Fresh Index has a pretty high percentage of unique links.
So the question we pose is how do unique percentages translate into numbers? In Table E below are the numbers of unique, live links each backlink data provider gives for the two URLs we tested:
Here at RankAbove we were looking for a backlink data supplier that will provide the Company with the most comprehensive information about backlinks that we could find. Our proprietary Drive algorithm makes a lot of decisions and SEO suggestions based on backlink data and we wanted to base those decisions on the most engulfing and complete backlink data that exists in the market. For that purpose, it is clear that MajesticSEO Historic index is without the doubt the best solution for our needs.
AHrefs was a worthy contender but at the moment their pricing structure and the amount of raw links they allow to download makes them an unfeasible solution for our needs. However, different users may require different things from their backlink data. The number of API calls or data processing (if you are automating your backlink analysis) may be an important factor, in which case the size of the index could be a negative factor in your decision. It is important to remember that extremely high percentages of link decay in Majestic Historic Index make it a necessity to perform independent crawling showing which of the links in the index are live. This process can be very resource demanding and this should be an important factor in choosing the backlink data provider. There are other parameters that we didn’t check – such as comprehensiveness of linking domain and/or anchor profile report, freshness of different indices or even the number of API functions available for dicing and slicing the data on the provider side.
Furthermore, with the disappearance of Yahoo Site Explorer, there are constantly new backlink data providers appearing in the market. We hope that both the results and the methodology presented in this post will help to accurately assess new providers that may arise in the future.
About the Author:
Branko Rihtman started out in the world of SEO in 2001. Since then he has helped numerous companies increase revenue in some of the most competitive online niches. Over that time, he realized that the SEO competitive advantage is to be found in proper testing and analysis and has started applying his scientific training to plan and execute extensive SEO experiments.
Branko completed his M.Sc. in Microbial Ecology in Hebrew University, Jerusalem. He was a featured speaker at a number of leading SEO conferences in the US and in Europe, such as SMX advanced, Sphinncon, MIT Forum, Affilicon, etc.Branko joined RankAbove in October 2011.