The State of Indexing: DACH eCommerce Markets

The State of Indexing in DACH eCommerce Markets - Hero Image

 

quick summary

  • We analyzed over 2.5 million URLs from 144 top eCommerce domains in Germany, Austria, and Switzerland,
  • 22% of all URLs we tested weren’t indexed by Google,
  • Looking at the content quality and technical quality metrics, we were able to find significant correlations between various quality metrics and index coverage for each domain we analyzed.

The DACH region consists of Germany, Austria, and Switzerland, and has a combined population of 100 million.

Search markets in these countries are considered very competitive, both because of their highly developed economies, and thanks to the exceptional level of Technical SEO that Germans, Austrians, and the Swiss are famous for.

We selected 144 top eCommerce domains from Germany, Austria, and Switzerland, and decided to check for ourselves how optimized these websites really are, looking at a wide range of key Technical SEO parameters. To pick a sample of URLs to test on each website, we used their sitemaps.

To our knowledge, this is the first piece of research in the history of SEO that analyzes Index Coverage on such a large sample of domains.

Click here if you want to go straight to the dataset.

chapter 1

Introduction and methodology

dach-ecommerce-indexing-state - dach ecommerce indexing state 2

The SEO industry has always had a problem with indexing analyses. The problem is simple to name but difficult to address: we never had the data to properly benchmark and compare indexing statistics between websites.

Google does provide plenty of useful indexing data in Google Search Console, specifically in the Index Coverage report. However, the main problem is that you can only access the Index Coverage report for the domains you have validated for your GSC account. This makes it very difficult to think and talk about indexing for the whole web and not just a small number of websites.

We set out to solve this problem several years ago and built ZipTie — an indexing intelligence platform that can inspect any web page and see if it’s indexed on Google. Comparing the data we’re getting from our tool with the data we have for hundreds of websites via Google Search Console, we estimate ZipTie’s current accuracy at 99%. It’s not perfect, but it’s pretty close.

ZipTie helped us overcome the lack of indexing data that would allow for a large-scale Google indexing analysis. It can check any number of URLs from any domain in the world.

This piece of research is the first large indexing analysis we conducted using ZipTie. We’ve tested 2.5 million URLs from 144 DACH eCommerce domains to check their indexing statistics and see if any quality metrics correlate with them.

Fully crawling so many domains came with significant technical challenges. Many of them blocked external crawling completely. For others, it was very difficult to find all URLs without rendering JavaScript, which is extremely resource-consuming.

For these reasons, we decided to use sitemaps instead. We analyzed thousands of URLs from each domain’s sitemap using ZipTie.dev, which let us check their indexability, indexing status, and various additional metrics like server response time or the number of images.

chapter 2

Indexability

dach-ecommerce-indexing-state - dach ecommerce indexing state 3

Every page needs to meet certain technical conditions before Google can show it in search results. That’s what indexability is all about.

For the purposes of our research, we define indexable pages as:

  • Canonical,
  • Not blocked by the noindex robots meta tag,
  • Not blocked by the disallow directive in robots.txt, and
  • Responding with 200 status code.

Keep in mind that Google may occasionally index pages that are blocked from crawling (by using the information found on other pages linking to them) or pages that are temporarily redirected.

It’s completely normal for eCommerce websites to contain lots of non-indexable pages, such as unavailable products or faceted category pages. These pages aren’t useful to users, and they shouldn’t be able to find them on Google.

However, such pages shouldn’t be placed in sitemaps. A sitemap is a valuable tool website owners can use to help Google quickly find indexable content on their websites. By keeping non-indexable pages in your sitemap, you’re sending mixed signals to Google.

For that reason, every website should aim at having 100% of indexable URLs in its sitemap. Our proposed metric for the health of a sitemap is the Indexability Ratio: the ratio of indexable URLs to all URLs in a sitemap. The closer your Indexability Ratio is to 1, the more satisfied Google will be with your sitemap.

Here’s how top DACH eCommerce domains fare in this regard:

Indexability Ratio of top DACH eCommerce domains presented on a line chart

As you can see, the stereotype holds true so far:

  • 79 out of 144 analyzed DACH eCommerce domains have perfect sitemaps, and
  • 80% of all domains have an Indexability Ratio of 0.9 or more.

At the same time, there are 14 domains that we tested that had an Indexability Ratio of 0.5 or less. This means their sitemaps were effectively useless as a source of information about valuable URLs for Google — you might as well flip a coin to guess whether a given page in the sitemap should be indexed or not.

But does that influence indexing?

chapter 3

Index Coverage

dach-ecommerce-indexing-state - dach ecommerce indexing state 4

Having access to an unprecedented amount of indexing data, we had to come up with methods to talk about it.

We quickly realized that one key thing we’re missing is an indexing metric. It’s no wonder it doesn’t really exist in the SEO industry — there’s simply too little data to properly measure indexing.

So, we’d like to propose a candidate: Index Coverage.

Index Coverage is calculated by dividing the number of your indexed pages by the total number of indexable pages on your website.

It’s an accurate health metric for every website. That’s because if Google can technically index pages on your site but doesn’t, there are three possible explanations:

  1. Your site is technically unoptimized, preventing Google from successfully crawling and rendering all your pages,
  2. Your content doesn’t pass Google’s quality standards,
  3. You failed to make sure that all pages that shouldn’t be indexed are non-indexable.

In either scenario, regularly measuring your Index Coverage can give you an understanding of how Google perceives your website’s place in the search index.

So, how does this metric look for some of the most popular DACH eCommerce domains?

dach-ecommerce-indexing-state - dach ecommerce indexing state 8a

The mean Index Coverage score for all domains is 78%. 

This means that 1 in every 5 URLs that Google could crawl and index weren’t found in the search results.

Pay close attention to how spread out the results are. 8% of all domains had an Index Coverage score of 50% or less. As a reminder, for a domain with an Index Coverage score of 50%, 1 in every 2 indexable URLs wasn’t indexed by Google.

At the same time, 23% of all domains had an Index Coverage score of 90% or above.

It’s a clear sign that it’s absolutely possible for a large eCommerce domain to have near-perfect Index Coverage — to have most of its indexable URLs picked up by Google. Of course, this is a perfect scenario that you should strive for. Every unindexed URL you have on your domain translates to wasted potential organic traffic.

Index Coverage vs Indexability Ratio

dach-ecommerce-indexing-state - dach ecommerce indexing state 7

The chart above highlights a very interesting fact: even though some sitemaps contain many non-indexable URLs, Google still indexed the URLs it was able to index at a pretty high rate.

In fact, for domains with the sitemap Indexability Ratio of 0.5 or less, the mean Index Coverage was 80% — that’s more than the average for the entire sample!

There are two hypotheses we can form based on this:

  1. Even if your sitemap isn’t optimized, Google won’t stop using it to find indexable URLs on your domain.
  2. Having a perfect sitemap doesn’t guarantee Google will index the URLs it finds there.

You might also think that this means having a sitemap doesn’t change much, but I would be very careful. Keep in mind that we don’t have the full picture of how these domains are indexed by Google — we didn’t crawl them in their entirety (technically, it was pretty much impossible). With a thorough analysis of specific domains, it might turn out that the domains with optimized sitemaps are crawled more efficiently, or that it helps Google to choose canonical URLs, and so on.

On the other hand, one might conclude that since Google was still mostly able to find the indexable URLs in messy sitemaps, it makes no sense to spend time optimizing sitemaps on your domain. And again, I would be skeptical here. It’s impossible to say exactly how a sitemap full of non-indexable URLs influences your website’s overall SEO.

chapter 4

Indexing versus Quality

dach-ecommerce-indexing-state - dach ecommerce indexing state 5

It can be extremely useful to know the Index Coverage of competing domains so you can benchmark your performance against competitors.

But, as stated before, Index Coverage is a general website health metric. It doesn’t tell anything about why some indexable pages on a website are not indexed.

Having this enormous dataset of over 2.5 million URLs and knowing their indexing status, we dug deeper to find out what prevents these URLs from getting indexed.

A qualitative analysis of millions of URLs isn’t possible, so instead, we looked for correlations between the indexing status of tested URLs (whether a given URL is indexed or not) and hundreds of other metrics. We used multiple metrics ranging from word count to H2 count to Largest Contentful Paint, trying to understand where Google’s quality thresholds lie for each domain.

Here’s what we found:

  1. There isn’t a single metric that strongly correlates with indexing for all domains.
  2. Most domains show significant correlations of Index Coverage with various other metrics when looked at separately.

Let’s look at some examples!

Decathlon.at

Decathlon is a leading sporting goods retailer with a wide variety of product categories. Based on our research, decathlon.at has an Index Coverage score of 76% — out of 10033 indexable URLs we found in their sitemaps, 7631 were indexed.

One of the challenges for eCommerce sites is providing both Google and regular customers with meaningful content on category pages. They primarily serve a navigational function — they are essentially collections of links. For Google, this is often not enough to get you indexed.

Here’s an example unindexed (as of May 19th, 2022) category page on decathlon.at:

https://www.decathlon.at/6121-setzkescher

As you can see, it’s a regular category page. But Google didn’t think it was good enough.

Now, here’s an example category page from the same domain, but this one is indexed:

https://www.decathlon.at/6852-mountainbikes

An example indexed category page on decathlon.at

Can you spot the difference?

The latter page (a category page listing mountain bikes) has plenty of additional content below product listings. For Google, this is valuable additional context that algorithms can use to understand the quality and meaning of this page.

To put it in numbers, the indexed category page has 5 h2 headers, while the unindexed one has 0. This is one quality aspect that ZipTie analyzes together with checking the indexing status of tested pages.

And as you can see by browsing our full dataset, the Spearman correlation between indexing and h2 count for decathlon.at is 0.62 — that’s pretty significant! It’s very likely that adding additional content to unindexed category pages on this domain would promptly get them indexed by Google.

Philips.at

In the case of philips.at, we’ll be looking at product pages and not category pages.

With an Index Coverage score of 86%, it’s one of the technically better domains within our sample, but there’s still room for improvement.

Take a look at these two product pages:

https://www.philips.at/c-p/EP5335_10/series-5000-kaffeevollautomat-mit-lattego-milchsystem

https://www.philips.at/c-p/191E2SB_10/lcd-monitor-mit-touch-technologie

An example indexed page on philips.atAn example unindexed page on philips.at

Can you guess which one is indexed by now?

ZipTie.dev tells us that it found as many as 69 images on the coffee machine product page. Meanwhile, the LCD monitor product page only has 25.

Looking at a larger sample of philips.at pages, we found that the correlation between indexing and image count is 0.69. For this particular domain, the more images, the better!

Disclaimer

Of course, we don’t have direct access to Google’s algorithms. We cannot say for sure that both for philips.at and decathlon.at, fixing these content discrepancies would solve their indexing issues. But Google doesn’t ever tell us exactly why a certain page isn’t indexed even though we think it should. We think this type of analysis is the best way forward.

Want to know more?

If you want to see the exact indexing statistics for all domains we tested, we got you covered.

Furthermore, for every domain in our sample, we calculated correlations between the indexing status of URLs on that domain and various performance and content quality metrics. 

Although every domain is different and we didn’t find a silver bullet metric that you could improve to automatically grow your Index Coverage, we found interesting correlations for most tested domains.

Get the full dataset!

    I want to receive email communication from Onely. I have read and agree to Onely's Privacy Policy.