Google’s documentation defines the Discovered – currently not indexed status in Google’s Index Coverage report as:
The page was found by Google, but not crawled yet. Typically, Google wanted to crawl the URL but this was expected to overload the site; therefore Google rescheduled the crawl. This is why the last crawl date is empty on the report.source: Google’s Index Coverage report
Tomek Rudzki researched the most common indexing issues shown in Google Search Console and found that Discovered – currently not indexed is one of them, right next to:
- Duplicate content,
- Crawled – currently not indexed,
- Soft 404s, and
- Crawl issues.
Addressing the Discovered – currently not indexed issue should be a priority as it can affect many pages and indicates that some of your pages haven’t been crawled and subsequently indexed.
This issue can be caused by many factors which, if not addressed, can lead to some pages never finding their way into Google’s index. And if that is the case, they won’t bring you organic traffic and drive any conversions.
This article is a deep-dive into the Discovered – currently not indexed section of the Search Console’s Index Coverage report, focusing on analyzing why your pages get there and how to fix any issues that could be causing it.
Where to find the Discovered – currently not indexed status
Discovered – currently not indexed is one of the issue types in the Index Coverage report in Google Search Console. The report shows the crawling and indexing statuses of the pages on your website.
Discovered – currently not indexed appears in the Excluded category, which includes URLs that Google hasn’t indexed but, from Google’s perspective, this situation is not the result of an error.
When using Google Search Console, you can click on the type of issue to see a list of affected URLs.
You may find that you intended to keep some of the reported URLs out of the index – and that’s fine. But you should monitor your valuable pages – if any of them haven’t been indexed, check what issues Google has found.
Discovery, crawling, and indexing
Before moving on to the characteristics of Discovered – currently not indexed and addressing this issue, let’s clarify what it takes for a URL to be ranked on Google:
- Google needs to find a URL before it can be crawled. URLs are most commonly discovered by following internal or external links, or XML sitemaps, which should contain all pages that should be indexed.
- By crawling pages, Google visits them and checks their content. Google does not have the resources to crawl all the pages it finds – and this fact is behind many crawling problems that sites experience.
- During indexing, Google extracts the content of pages and evaluates their quality. Getting indexed is necessary to appear in search results and get organic traffic from Google. Indexed pages are evaluated based on numerous ranking factors, determining how they get ranked in response to search queries users input in Google.
Getting indexed by Google is challenging due to the limited capacity of its resources, the ever-growing web, and because Google expects a certain level of quality from pages that it indexes.
Many technical and content-related factors can play a role in your pages not getting crawled or indexed.
There are solutions to increase the chances of getting indexed. These include:
- Having a crawling strategy that prioritizes the crawling of valuable parts of your website,
- Implementing internal linking,
- Creating an accurate sitemap containing all URLs that should be indexable, and
- Writing high-quality, valuable content.
Be sure to go through Google’s documentation – there is a section on guidelines to follow to make it easier for Google to crawl and index your pages.
How to use the Discovered – currently not indexed report section
The Discovered – currently not indexed status is the place to go to stay up-to-date with any potential crawling issues.
After finding URLs in this section, check if they should be crawled in the first place.
If they should, try to locate a pattern in what URLs appear in the report. This will help you identify what aspects of these URLs could be causing the problem.
For example, the issue may concern URLs in a specific category of products, pages with parameters, or those with a specific structure, causing them to all be considered thin content.
When the Discovered – currently not indexed section requires action
URLs in Discovered – currently not indexed do not always require you to make changes to your website.
Namely, you don’t need to do anything if:
- The number of affected URLs is low and remains stable over time, or
- The report contains URLs that shouldn’t be crawled or indexed, e.g., those with canonical or ‘noindex’ tags, or those blocked from crawling in your robots.txt file.
But it’s still crucial to have this report’s section under control.
The URLs require your attention if their number increases, or they consist of valuable URLs that you expect to rank and bring you significant organic traffic.
The impact of Discovered – currently not indexed on small vs. large websites
The impact of Discovered – currently not indexed section may differ depending on a website’s size.
If you have a smaller website – that usually does not exceed 10k URLs – and your pages have good quality, unique content, the Discovered – currently not indexed status will often resolve itself. Google may be encountering no issue but simply hasn’t crawled the listed URLs yet.
Small sites don’t generally deal with crawl budget issues, and a surge in reported pages can emerge due to content quality issues or poor internal linking structure.
The Discovered – currently not indexed status can be particularly severe for large sites (over 10k URLs) and apply to thousands or even millions of URLs.
At Onely, we have found that websites containing more than 100k URLs typically suffer from crawling issues, frequently originating from wasted crawl budget.
These issues will typically occur on eCommerce websites. They often have duplicate or thin content or contain out-of-stock or expired products. Such pages will usually lack the quality needed to get in Google’s indexing queue, let alone be crawled.
When launching a large site
If you are just launching a large website, you can make Googlebot’s job easier from the beginning.
If you want to launch a large site, you shouldn’t launch its whole structure immediately if it contains many empty or unfinished pages that will only be updated later. Googlebot will come across these pages and deem them low-quality, which poses a risk of having a low crawl budget from the start. And this situation may even take years to fix.
It’s much better to add content as you release it regularly. This way, Googlebot gets a positive impression of your quality right from the start.
Before you launch, you should always have an indexing and crawling strategy in place and know which pages should be visited by Google.
Causes for the Discovered – currently not indexed status and how to fix them
Typically, URLs will be classified as Discovered – currently not indexed due to content quality, internal linking, or crawl budget issues.
Let’s consider why you may be seeing your pages with this status and how to fix it.
Content quality issues
Google has quality thresholds that it wants pages to meet since it can’t crawl and index everything on the web.
Google may view some pages on your domain as not worth crawling and skip them, prioritizing other, more valuable content. As a result, these URLs can be marked as Discovered – currently not indexed.
To start addressing this issue, go through the list of affected URLs and ensure each page contains unique content. The content should satisfy the user’s search intent and solve a specific problem.
I recommend that you go through the Quality Rater Guidelines that Google follows when evaluating websites – it will help you understand what Google is looking for in content found on the web.
At the same time, don’t forget that you shouldn’t have all of your pages indexed.
Some low-quality pages should not be indexable, such as:
- Outdated content (like old news articles),
- Pages generated by a search box within a website,
- Pages generated by applying filters,
- Duplicate content,
- Auto-generated content,
- User-generated content.
It’s best to block such sections from being crawled and indexed in your robots.txt file.
During SEO Office Hours on December 31st, 2021, John Mueller discussed making changes to the quality of a website as a way of addressing Discovered – currently not indexed:
[…] Making bigger quality changes on a website takes quite a bit of time for Google systems to pick that up. […] This is something more along the lines of several months and not several days. […] Because it takes such a while to get quality changes picked up, my recommendation would be not to make small changes and wait and see if it’s good enough, but rather really make sure that, if you’re making significant quality changes, […] it’s really good quality changes […]. You don’t want to wait a few months and then decide, ‘Oh, yeah, I actually need to change some other pages, too.’source: John Mueller
Googlebot follows internal links on your site to discover other pages and understand the connections between them. Therefore, ensure your most important pages are frequently linked internally.
Martin Splitt talked about why incorrect linking structures could be problematic in the Rendering SEO webinar:
[…] If we have like a thousand URLs from you, that are all only in the sitemap and we haven’t seen them in any of the other pages that we crawled, we might be like, ‘We don’t know how important this really is’ […]. Instead of just having it in the sitemap, link to it from other places on your website so that when we crawl these pages, we see ‘Aha! So this page, and this page, and this page are all pointing to this product page, so maybe it is a little more important than this other product that only lives in the sitemap’ […].source: Martin Splitt
Proper internal linking revolves around connecting your pages to create a logical structure that helps search engines and users follow your site’s hierarchy. Internal linking is also associated with how your site architecture is laid out.
Helping search engines find and assign appropriate importance to your pages includes:
- Deciding what your cornerstone content is and making sure it’s linked to from other pages,
- Adding contextual links in your content,
- Linking pages based on their hierarchy, e.g., by linking parent pages to child pages and vice versa, or including links in the site’s navigation,
- Avoiding placing links in a spammy way and over-optimizing anchor text,
- Incorporating links to related products or posts.
You can also read this article on improving the internal link structure.
The crawl budget is the number of pages Googlebot can and wants to crawl on a website.
A site’s crawl budget is determined by:
- Crawl rate limit – how many URLs Google can crawl, which is adjusted to your website’s capabilities,
- Crawl demand – how many URLs Google wants to crawl, based on how important it considers the URLs, by looking at their popularity and how often they are updated.
Wasting the crawl budget can lead to search engines’ inefficient crawling of your website. As a result, some fundamental parts of your website may be skipped.
Many factors can be causing crawl budget issues – they include:
- Low-quality content,
- Poor internal linking structure,
- Mistakes in implementing redirects,
- Overloaded servers,
- Heavy websites.
Before optimizing your crawl budget, you should look into exactly how Googlebot is crawling your site.
You can do that by navigating to another helpful tool in the Search Console – the Crawl stats report. Also, check your server logs for detailed information on what resources Googlebot has crawled and what it skipped.
Below are 5 aspects you should look into to optimize your crawl budget and get Google to crawl some of the Discovered – currently not indexed pages on your site:
If Googlebot can freely crawl low-quality pages, it may not have the resources to get to the valuable stuff on your website.
To stop search engine crawlers from crawling certain pages, apply the correct directives in the robots.txt file.
You should also ensure your website has a correctly optimized sitemap that helps Googlebot discover unique, indexable pages on your site and notice changes on them.
The sitemap should contain:
- URLs responding with 200 status codes,
- URLs without meta robots tags blocking them from being indexed, and
- Only the canonical versions of your pages.
If Google doesn’t find enough links coming to a URL, it may skip crawling it due to insufficient signals pointing to its importance.
Follow my guidelines outlined in the “Internal linking issues” subchapter.
Mistakes in implementing redirects
Implementing redirects can be beneficial for your site – but only if done correctly. Whenever Googlebot encounters a redirected URL, it has to send an additional request to get to the destination URL, which requires more resources.
Be sure you stick to best practices for implementing redirects. You can redirect both users and bots from 404 error pages that have been linked from external sources to working pages, which will help you preserve ranking signals.
Make sure you don’t link to redirected pages, though – instead, update them so they point to correct pages. You also need to avoid redirect loops and chains.
Google could experience crawling issues because your site appeared to be overloaded. This occurs because the crawl rate, which impacts the crawl budget, is adjusted to your server capabilities.
In a webinar on Rendering SEO, Martin Splitt discussed server issues concerning Google’s crawling of pages:
[…] One thing that I see happen quite often is that servers give intermittent errors – specifically, 500-something – and anything that your server responds to with a 500, 501, 502, 504, whatever, means your server says ‘Hold on, I have a problem here’ […], and it might fall over any moment, so we are backing off. Whenever we are backing off, and your server responds positively, we are usually ramping up slowly again. Imagine having a 500-something response every day.
We are seeing this, we are backing off a little bit, we are ramping back up – we’re seeing it again […]. You should look into if your server responds negatively.source: Martin Splitt
Check with your hosting provider if there are any server issues on your site.
Server issues can also be caused by poor web performance – find out more by reading our article on web performance and crawl budget.
Crawling issues can be caused by some pages being too heavy. Google may simply have insufficient resources to crawl and render them.
Every resource that Googlebot needs to fetch to render your page counts toward your crawl budget. In this case, Google sees a page but pushes it further in the priority queue.
John Mueller on addressing Discovered – currently not indexed
During SEO Office Hours, John Mueller was asked about resolving the issue of around 99% of URLs on a website stuck in the Discovered – currently not indexed report section.
John’s recommendations revolved around three main steps:
[…] I would first of all perhaps look […] that you’re not accidentally generating URLs with differing URL patterns, […] things like the parameters that you have in your URL, upper lower case, all of these things can lead to essentially duplicate content. And if we’ve discovered a lot of these duplicate URLs, we might think we don’t actually need to crawl all of these duplicates because we have some variation of this page already in there […]. Make sure that from the internal linking, everything is ok. That we could crawl through all of these pages on your website and make it through the end. You can roughly test this by using a crawler tool or something like Screaming Frog or Deep Crawl. […] They will tell you essentially if they’re able to crawl through to your website and show you the URLs that were found during that crawling. If that crawling works, then I would strongly focus on the quality of these pages. If you’re talking about 20 million pages and 99% of them are not being indexed, then we’re only indexing a really small part of your website. […] Perhaps it makes sense to say, ‘Well, what if I reduce the number of pages by half or maybe even […] to 10% of the current count’. […] You can generally make the quality of the content there a little bit better by having more comprehensive content on these pages. And for our systems, it’s a bit easier to look at these pages and say, ‘Well, these pages […] actually look pretty good. We should go off and crawl and index a lot more’.source: John Mueller
Discovered – currently not indexed vs. Crawled – currently not indexed
These two statuses commonly get confused and, though they are connected, they mean different things.
In both cases, the URLs haven’t been indexed but, with Crawled – currently not indexed, Google has already visited the page. With Discovered – currently not indexed, the page has been found by Google but hasn’t been crawled.
Crawled – currently indexed is often caused by an indexing delay, content quality issues, site architecture problems, or a page could have gotten deindexed.
We also have a detailed article that explains how to fix Crawled – currently not indexed.
Discovered – currently not indexed tends to be caused by page quality and crawl budget issues.
Fixing these issues – and helping Google efficiently and accurately crawl your pages in the future – may require you to go through many aspects of your pages and optimize them.
Here are a few main things that can help avoid problems with Discovered – currently not indexed pages:
- Use robots.txt to prevent Googlebot from crawling low-quality pages, focusing on duplicate content, e.g., pages generated by filters or search boxes on your site.
- Take time to create a proper sitemap that Google can use to discover your pages.
- Keep your site architecture intact and ensure your crucial pages are linked internally.
- Have an indexing strategy in place to prioritize the pages that are most valuable to you.
- Optimize with the crawl budget in mind.