If some of your pages aren’t indexed, you may be losing revenue. Your potential clients won’t find your products on Google.
But how to check if Google successfully indexed all of your valuable pages?
One of the easiest ways is to use the Google Search Console. This article will tell you how.
Making sure that Google indexed all of your content is a complex process, particularly if you have a large website. It’s best to start with the bigger picture and work your way from there.
Get started with Index Coverage (Page Indexing)
First, get a bird’s-eye view of your indexing issues.
Using the Google Search Console:
- You can see what percentage of URLs on your website are indexed.
- You can discover the leading cause for your indexing issues: for instance, if it’s related to low quality or duplicate content issues.
After logging in to your Google Search Console, navigate to the Pages report in the Index tab. Previously, this report was titled “Coverage,” but on August 15, 2022, Google updated the naming of some Search Console tools.
Here, you will see a list of URLs divided into two categories:
- Not indexed: these pages aren’t indexed. It might happen due to:
- Indexed: these pages are indexed. That doesn’t mean that they’re free from any problems. Some of them may still be dealing with issues you should investigate.
You can easily toggle between the categories.
Keep in mind that each report is limited to 1000 pages. Later in the article, I will show you how to work with this limit.
Narrow down the results
You can quickly narrow down the report to a particular sitemap.
For instance, if you have an eCommerce store and a sitemap divided into products and product listings, you can check the indexing rate of products and product listings separately in Google Search Console.
Why some URLs are excluded from Google’s index
If some of your URLs are excluded from the index, you should review the Excluded reports that show you the URLs excluded from Google’s index.
Let’s look at why a page may not be indexed in the first place:
- Google didn’t discover it (Google may still discover it in the future).
- The page is in the crawling queue. Crawling queues can be extremely long and low priority URLs can ultimately drop out of the queue.
- Google decided that the page is of low quality.
- Google treats it as duplicate content.
- There are technical issues (for instance, crawl anomalies, or the page is blocked by robots.txt, or Google has an indexing bug).
There are over 20 reports and each one is worth checking like often occurring “crawled – currently not indexed” or “discovered – currently not indexed”. But I would like to show you my favorite ones. I compiled them in the table below:
If some of your URLs are excluded from the index, you should review the table visible below the chart displaying the amount of indexed and not indexed pages.
The table lists the issues that prevented your URLs from being included in the Google index.
You can click on any of the problems listed in the table for further investigation. You will then see a chart generated for it and examples of your URLs that struggle with this issue.
There are many reports, and each one is worth checking, but I would like to show you my favorite ones. I compiled them in the table below:
|Soft 404||This is a very interesting report that shows URLs that Google thinks are of low quality.
Sometimes, when Google can’t render your page, it may not see all of your content. If that’s the case, Google would see an empty page and treat it as a soft 404.
Review this section to check if there are high-quality pages wrongly classified by Google as soft 404s
|Reports related to duplicate content:
||The canonical tag is just a hint, and Google may ignore it and choose different pages as canonical ones.|
Limitation of 1000 URLs
There are at least two ways to work around the limit of 1000 URLs.
1. Create multiple domain properties in Google Search Console.
You can create a separate domain property for any section of your website.
2. Create multiple sitemaps.
- Separate sitemaps for products, product categories, and blog pages. This way you can track the indexing rate of products and categories independently.
- Separate sitemaps for each language version. This way you can see which language versions struggle the most. Commonly, issues occur when one language is used in several countries (e.g. Spanish in Mexico and in Spain).
- Split product lists into chunks. For example, if you have 50k products, you can split them into 10 sitemaps of 5000 products instead of using one single sitemap.
Then, using the Google Search Console filters you can easily see the results for each particular sitemap. Just click on a particular report using the Google Search Console User interface.
Are you not sure how to create a sitemap? Check out our ultimate guide to XML sitemaps.
Check a single URL
The next step is to get familiarized with the URL Inspection tool. Using it, you can check the indexing status of individual URLs with just a few clicks.
If a given page is not indexed, you will get to know why.
For each page, the URL Inspection tool displays one of the three statuses:
- URL is on Google: This page is indexed and will appear in search results. There aren’t any critical issues occurring. It may have more minor problems, e.g., with structured data, but it shouldn’t affect how Google displays it.
- URL is on Google but has issues: This page is indexed and will appear in search results, but a critical error is affecting an aspect of the page on Google. For example, the structured data on this page may be ignored.
- URL is not on Google: This page is not indexed due to critical errors, such as being blocked by a noindex tag.
To use the tool, click on URL Inspection in Google Search Console.
Then type in the URL you want to check.
My recommendation is to check multiple URLs using this tool; get a proper sample to test, and don’t just look at a single URL.
For instance, you may have a website with multiple sections:
Make sure you check every section, with a representative sample for each section.
Also, I recommend preparing a list of URLs that are popular on the website but get zero traffic from Google. Chances are great that some of these pages aren’t indexed. Knowing which pages aren’t indexed, and why, you will be able to address the issue.
GSC is not perfect
Like with every piece of software, Google Search Console has some bugs. For instance, I have recently discovered Google commonly marks URLs found in sitemaps as “NOT found in sitemaps” I noticed this issue across a sample 12 of websites.
Although GSC is a great tool for diagnosing indexing issues, it has some limitations, which is why we decided to create our own tool and it has a number of advantages over GSC.
- First of all, we can check URLs in bulk. It’s not possible in GSC.
- Secondly, our tool enables us to check the indexing status of our clients’ competitors, which isn’t possible with GSC.
- Last but not least, our tool allows us to look for partial indexing issues. It’s very common that Google skips some content when indexing the whole page.
If your needs extend what GSC has to offer, contact us! We can help you get the full picture of how your website is indexed by Google, including specific fragments our your pages that you really need indexed.
How to increase the probability a page will be indexed
By now you should know how to check which sections of your website aren’t fully indexed. Now you can start looking for solutions!
Here are some of the common steps you can take to solve that problem:
- Add the unindexed page to a sitemap. Additionally, check if your sitemaps don’t contain URLs that shouldn’t be indexed in the first place.
- Ensure the page is not an orphan page. Are there quality links pointing to the page? Crawl your website to find pages that are orphaned. For instance, use Screaming Frog’s Orphaned pages report.
- Fix your duplicate content issues.
- Make sure there are no technical obstacles for indexing (the page may be blocked in robots.txt, or have a canonical tag pointing to a different page).
- Optimize your crawl budget. Make sure Google is not spending too much time on the low-quality content on your website.
I hope this article helped you understand how you can use the Google Search Console to see if your website is indexed. Not having all your pages indexed by Google is a very common problem, so you should regularly check if your content can be found on Google.