According to Tomek: Diagnosing indexing issues using GSC - Hero Image

According to Tomek: Diagnosing Indexing Issues Using GSC

13 Nov 2020

quick summary

If some of your pages aren’t indexed, you may be losing revenue. Your potential clients won’t find your products on Google. 

 

But how to check if Google successfully indexed all of your valuable pages?

 

One of the easiest ways is to use the Google Search Console. This article will tell you how.

 

Making sure that Google indexed all of your content is a complex process, particularly if you have a large website. It’s best to start with the bigger picture and work your way from there.

Get started with Index coverage

First, get a bird’s-eye view of your indexing issues. 

Using the Google Search Console: 

  • You can see what percentage of URLs on your website are indexed
  • You can discover the leading cause for your indexing issues: for instance, if it’s related to low quality or duplicate content issues. 

After logging in to your Google Search Console, navigate to the Coverage report in the Index tab.

The Coverage report in Google Search Console gives you an overview of your website's indexing status

Here, you will see a list of URLs divided into four categories:

The four categories of URLs in the Google Search Console
  1. Error: these pages aren’t indexed due to errors such as 5xx HTTP status code, Googlebot being blocked from crawling them by the robots.txt file, or the usage of the noindex tag.
  2. Valid with warning: these pages are indexed, but there are issues you should investigate.
  3. Valid: these pages are successfully indexed. 
  4. Excluded: these pages aren’t indexed.

You can easily toggle between each of the categories. 

Keep in mind that each report is limited to 1000 pages.  Later in the article, I will show you how to work with this limit.

Narrow down the results

You can quickly narrow down the report to a particular sitemap. 

For instance, if you have an eCommerce store and a sitemap divided into products and product listings, you can check the indexing rate of products and product listings separately in Google Search Console.

Coverage reports for separate sitemaps in Google Search Console

Why some URLs are excluded from Google’s index

If some of your URLs are excluded from the index, you should review the Excluded reports that show you the URLs excluded from Google’s index.  

Let’s look at why a page may not be indexed in the first place:

    1. Google didn’t discover it (Google may still discover it in the future).
    2. The page is in the crawling queue. Crawling queues can be extremely long and low priority URLs can ultimately drop out of the queue.  
    3. Google decided that the page is of low quality. 
    4. Google treats it as duplicate content.
    5. There are technical issues (for instance, crawl anomalies, or the page is blocked by robots.txt, or Google has an indexing bug). 

There are over 20 reports and each one is worth checking, but I would like to show you my favorite ones. I compiled them in the table below: 

Error Description
Server errors If there is a substantial number of server errors (500), Google will slow down the crawling speed and may index fewer URLs.  
The submitted URL seems to be a soft 404 This is a very interesting report that shows URLs that Google thinks are of low quality.

Sometimes, when Google can’t render your page, it may not see all of your content. If that’s the case, Google would see an empty page and treat it as a soft 404.

Review this section to check if there are high-quality pages wrongly classified by Google as soft 404s

Reports related to duplicate content (Alternate page with proper canonical tag, Duplicate, Google chose different canonical than user) The canonical tag is just a hint and Google may ignore it and choose different pages as canonical ones. 

Limitation of 1000 URLs

There are at least two ways to work around the limit of 1000 URLs.

1. Create multiple domain properties in Google Search Console.

You can create a separate domain property for any section of your website. 

Separate domain properties created for specific site sections in Google Search Console

2. Create multiple sitemaps.

  • Separate sitemaps for products, product categories, and blog pages. This way you can track the indexing rate of products and categories independently.
  • Separate sitemaps for each language version. This way you can see which language versions struggle the most. Commonly, issues occur when one language is used in several countries (e.g. Spanish in Mexico and in Spain).
  • Split product lists into chunks. For example, if you have 50k products, you can split them into 10 sitemaps of 5000 products instead of using one single sitemap. 

Then, using the Google Search Console filters you can easily see the results for each particular sitemap. Just click on a particular report using the Google Search Console User interface. 

URLs split into multiple sitemaps allow you to independently check their indexing status in Google Search Console

Check a single URL 

The next step is to get familiarized with the URL Inspection tool. Using it, you can check the indexing status of individual URLs with just a few clicks. 

If a given page is not indexed, you will get to know why. 

As a bonus, The URL Inspection Tool will tell you:

  • when Google visited it for the last time, 
  • where Google found the link to it.

To use the tool, click on URL Inspection in Google Search Console.

The URL Inspection Tool in Google Search Console allows you to check the indexing status of a single page

Then type in the URL you want to check. 

My recommendation is to check multiple URLs using this tool; get a proper sample to test, don’t just look at a single URL.

For instance, you may have a website with multiple sections:

  • example.com/shop
  • example.com/blog
  • example.com/photos.

Make sure you check every section, with a representative sample for each section. 

Also, I recommend preparing a list of URLs that are popular on the website but get zero traffic from Google.  Chances are great that some of these pages aren’t indexed. Knowing which pages aren’t indexed, and why, you will be able to address the issue.

GSC is not perfect

Like with every piece of software, Google Search Console has some bugs. For instance, I have recently discovered Google commonly marks URLs found in sitemaps as “NOT found in sitemaps” I noticed this issue across a sample 12 of websites. 

Although GSC is a great tool for diagnosing indexing issues, it has some limitations, which is why we decided to create our own tool and it has a number of advantages over GSC.

  • First of all, we can check URLs in bulk. It’s not possible in GSC. 
  • Secondly, our tool enables us to check the indexing status for our clients’ competitors, which isn’t possible with GSC.
  • Last but not least, our tool allows us to look for partial indexing issues. It’s very common that Google skips some content when indexing the whole page.

If your needs extend what GSC has to offer, contact us! We can help you get the full picture of how your website is indexed by Google, including specific fragments our your pages that you really need indexed.

How to increase the probability a page will be indexed

By now you should know how to check which sections of your website aren’t fully indexed. Now you can start looking for solutions!

Here are some of the common steps you can take to solve that problem:

  • Add the unindexed page to a sitemap. Additionally, check if your sitemaps don’t contain URLs that shouldn’t be indexed in the first place.
  • Ensure the page is not an orphan page. Are there quality links pointing to the page? Crawl your website to find pages which are orphaned. For instance, use Screaming Frog’s Orphaned pages report.
  • Fix your duplicate content issues.
  • Make sure there are no technical obstacles for indexing (the page may be blocked in robots.txt, or have a canonical tag pointing to a different page).
  • Optimize your crawl budget. Make sure Google is not spending too much time on the low-quality content on your website.

Wrapping up

I hope this article helped you understand how you can use the Google Search Console to see if your website is indexed. Not having all your pages indexed by Google is a very common problem, so you should regularly check if your content can be found on Google.

Book15minCallIcon

Give us 15 minutes of your time and find out why big brands trust Onely with their major technical SEO issues.