Partial indexing hero image

According to Tomek: Partial Indexing

16 Dec 2020

It seems to me that most people assume the following: if a given web page can be found on Google (using a site: command or otherwise), it means that all of its content is indexed by Google. 

This is not the case. We’re the first to have the data that demonstrates it.

Our custom-built software allows us to check on a massive scale whether particular web pages and parts of their content show up in the search results. And it turns out that Google commonly skips indexing parts of your content, even if it’s pure HTML. 

It isn’t rare that the main content of a page is indexed, but less important parts, like related items or shipping info, are not.

But in this post, I want to show you a website that has thousands of pages with unindexed MAIN CONTENT.

Target.com 

23% of indexed pages from Target.com don’t have their main product description indexed. So Google is ranking one in four Target.com product pages based on features other than the main content.

Just as a reminder, this is independent of 9.28% of pages in Target’s sitemaps that are not indexed at all, not even partially.

Inspecting an example

Here’s a product page that can currently be found on Google using the site: command, but can’t be found by looking up the product description.

https://www.target.com/p/the-true-face-of-sir-isaac-brock-by-guy-st-denis-paperback/-/A-81014122

A product description that doesn't show up on Google

And here’s the workflow we use to check if Google indexed the main product description. 

  1. Our initial step is to ensure the URL is indexed.
    The URL that's indexed by Google shows up in the search results
  2. Then we check if the content fragment is indexed.
    The specified fragment of the product description doesn't show up on Google
  3. Sometimes the “site” command doesn’t work as expected so we also look up a fragment of the product description on Google.  

There is no Target.com ranking for this term in the top 100 results. 

Target.com doesn't show up in the first 100 results for the query that's the product description taken from their product page

Conclusion

Evidence from this query as well as the “site” command points to the fact that Google didn’t index the main content of Target.com in this case. 

The same is true of 23% of all product pages on Target.com. 

Below you can find a couple of examples:

Page Fragment of main content Is a page indexed? Is the main content indexed?
https://www.target.com/p/the-true-face-of-sir-isaac-brock-by-guy-st-denis-paperback/-/A-81014122 Major General Sir Isaac Brock is remembered as the Hero Yes No
https://www.target.com/p/sunex-240d-1-piece-1-2-in-drive-x-1-1-4-in-deep-impact-socket/-/A-80837218 With over 40 years in the business, we pride ourselves on Yes No
https://www.target.com/p/the-moderate-soprano-faber-drama-by-david-hare-paperback/-/A-78505815 David Hare’s first full-length play was produced in 1970. Yes No

Target is not an exception

We checked random samples of pages from hundreds of other popular websites. As you can see below, Google doesn’t index the main content on many other websites.

Website  % of indexed pages with main content not indexed Additional notes
aboutyou.de 37% On mobile, product details are hidden under tabs. 
sportsdirect.com 8%
charlotterusse.com 8%
zappos.com 16%
boohoo.com 14%
zulily.com 70%
lidl.de 3%
walmart.com 45% On mobile, product details are hidden under tabs. 
hm.com 6%
samsclub.com 39%

Why this happens

In the case of Target.com, the main content is placed in the initial HTML, so we can forget about blaming JavaScript.

In my opinion, there are three possible reasons:

  1. Google’s algorithms didn’t recognize the main content.
  2. Google decided it’s duplicate content. 
  3. There are unspecified bugs on Google’s end. 

If the first hypothesis is true, Google has a serious problem. The main content of Target.com is not hidden on mobile so Google shouldn’t have any issues with detecting it.

It’s true that Target.com doesn’t use semantic HTML to indicate the main content. But according to Google, using semantic HTML is helpful, but not necessary, so it shouldn’t be a problem.

In the case of Target.com, I suspect the second hypothesis is true. E-commerce stores commonly use the product description provided by the manufacturer.  Google might have encountered the same product description before and because of that, decided to skip indexing it this time.

Takeaway

Google uses thousands of signals to rank pages displayed in the search results. But the most basic ranking factor is relevance.

To rank high on Google, you need to convince the search engine that your page is relevant to the user’s query. And if Google doesn’t index the main part of your content, your chances of proving that your page is relevant are minimal.

The good news is that we are here to help you diagnose and solve any potential indexing problems your website might have.

Disclaimer

It’s important to note that after this is published, Google may fully index the content of the Target.com page that I used as an example. If you notice that happening, let me know! I have tens of thousands of other examples in the database, so we’ll update the article accordingly.

After the article was published, some SEOs stated that Google might have indexed the content, but doesn’t rank it for particular queries in the search results.

We don’t have access to Google’s internal systems, so in my opinion, it’s accurate to say that Google didn’t index the content. However, it should be clearly stated it can be a ranking issue as well. Ranking is strictly connected to indexing. 

At the end of the day, the page doesn’t rank high, regardless of the cause of this issue. 

Book15minCallIcon

Give us 15 minutes of your time and find out why big brands trust Onely with their major technical SEO issues.