It seems to me that most people assume the following: if a given web page can be found on Google (using a site: command or otherwise), it means that all of its content is indexed by Google.
This is not the case. We’re the first to have the data that demonstrates it.
Our custom-built software allows us to check on a massive scale whether particular web pages and parts of their content show up in the search results. And it turns out that Google commonly skips indexing parts of your content, even if it’s pure HTML.
It isn’t rare that the main content of a page is indexed, but less important parts, like related items or shipping info, are not.
But in this post, I want to show you a website that has thousands of pages with unindexed MAIN CONTENT.
Target.com
23% of indexed pages from Target.com don’t have their main product description indexed. So Google is ranking one in four Target.com product pages based on features other than the main content.
Just as a reminder, this is independent of 9.28% of pages in Target’s sitemaps that are not indexed at all, not even partially.
Inspecting an example
we have a product page that can currently be found on Google using the site: command, but can’t be found by looking up the product description.
And here’s the workflow we use to check if Google indexed the main product description.
- Our initial step is to ensure the URL is indexed.
- Then we check if the content fragment is indexed.
- Sometimes the “site” command doesn’t work as expected so we also look up a fragment of the product description on Google.
There is no Target.com ranking for this term in the top 100 results.
Conclusion
Evidence from this query as well as the “site” command points to the fact that Google didn’t index the main content of Target.com in this case.
The same is true of 23% of all product pages on Target.com.
Below you can find a couple of examples:
Page | Fragment of main content | Is a page indexed? | Is the main content indexed? |
https://www.target.com/p/the-true-face-of-sir-isaac-brock-by-guy-st-denis-paperback/-/A-81014122 | Major General Sir Isaac Brock is remembered as the Hero | Yes | No |
https://www.target.com/p/sunex-240d-1-piece-1-2-in-drive-x-1-1-4-in-deep-impact-socket/-/A-80837218 | With over 40 years in the business, we pride ourselves on | Yes | No |
https://www.target.com/p/the-moderate-soprano-faber-drama-by-david-hare-paperback/-/A-78505815 | David Hare’s first full-length play was produced in 1970. | Yes | No |
Target is not an exception
We checked random samples of pages from hundreds of other popular websites. As you can see below, Google doesn’t index the main content on many other websites.
Website | % of indexed pages with main content not indexed | Additional notes |
---|---|---|
aboutyou.de | 37% | On mobile, product details are hidden under tabs. |
sportsdirect.com | 8% | |
charlotterusse.com | 8% | |
zappos.com | 16% | |
boohoo.com | 14% | |
zulily.com | 70% | |
lidl.de | 3% | |
walmart.com | 45% | On mobile, product details are hidden under tabs. |
hm.com | 6% | |
samsclub.com | 39% |
Why this happens
In the case of Target.com, the main content is placed in the initial HTML, so we can forget about blaming JavaScript.
In my opinion, there are three possible reasons:
- Google’s algorithms didn’t recognize the main content.
- Google decided it’s duplicate content.
- There are unspecified bugs on Google’s end.
If the first hypothesis is true, Google has a serious problem. The main content of Target.com is not hidden on mobile so Google shouldn’t have any issues with detecting it.
It’s true that Target.com doesn’t use semantic HTML to indicate the main content. But according to Google, using semantic HTML is helpful, but not necessary, so it shouldn’t be a problem.
In the case of Target.com, I suspect the second hypothesis is true. E-commerce stores commonly use the product description provided by the manufacturer. Google might have encountered the same product description before and because of that, decided to skip indexing it this time.
As per usual, indexing issues are most likely caused by a mix of technical factors, which is why we always audit all technical aspects of every website as a part of our technical SEO services.
Takeaway
Google uses thousands of signals to rank pages displayed in the search results. But the most basic ranking factor is relevance.
To rank high on Google, you need to convince the search engine that your page is relevant to the user’s query. And if Google doesn’t index the main part of your content, your chances of proving that your page is relevant are minimal.
The good news is that you can contact Onely to diagnose and solve any potential indexing problems your website might have.
Disclaimer
It’s important to note that after this is published, Google may fully index the content of the Target.com page that I used as an example. If you notice that happening, let me know! I have tens of thousands of other examples in the database, so we’ll update the article accordingly.
After the article was published, some SEOs stated that Google might have indexed the content, but doesn’t rank it for particular queries in the search results.
We don’t have access to Google’s internal systems, so in my opinion, it’s accurate to say that Google didn’t index the content. However, it should be clearly stated it can be a ranking issue as well. Ranking is strictly connected to indexing.
At the end of the day, the page doesn’t rank high, regardless of the cause of this issue.