“Indexed, though blocked by robots.txt” is a Google Search Console status. It means that Google didn’t crawl your URL but indexed it nonetheless.
This status indicates a serious SEO issue that you should immediately address.
What does indexing have to do with robots.txt?
The “Indexed, though blocked by robots.txt” status may be confusing. That’s because it’s a common misconception that robots.txt directives can be used to control indexing – this isn’t the case.
The status means that Google indexed the page even though you blocked it from analyzing it, intentionally or by mistake.
Let me help you understand the relationship between robots.txt and the indexing process. It’ll make grasping the final solution easier.
How do discovery, crawling, and indexing work?
Before a page gets indexed, search engine crawlers must first discover and crawl it.
At the discovery stage, the crawler learns that a given URL exists. While crawling, Googlebot visits that URL and collects information about its contents. Only then does the URL go to the index and can be found among other search results.
Psst. The process isn’t always that smooth, but you can learn how to help it by reading our articles on:
- How to fix the “Discovered ‐ Currently Not Indexed” status in GSC, and
- How to fix the “Crawled – Currently Not Indexed” status in GSC.
What is robots.txt?
You can block specific URLs from crawling with robots.txt. It’s a file that you can use to control how Googlebot crawls your website. Whenever you put a Disallow directive in it, Googlebot knows that it cannot visit pages to which this directive applies.
But robots.txt doesn’t control indexing.
Let’s explore what happens when Google receives conflicting signals from your website, and indexing gets messy.
The cause for “Indexed, though blocked by robots.txt”
Sometimes Google decides to index a discovered page despite being unable to crawl it and understand its content.
In this scenario, Google is usually motivated by the large number of links leading to the page blocked by robots.txt.
Links translate into PageRank score. Google calculates it to assess whether a given page is important. The PageRank algorithm takes into account both internal and external links.
When there’s a mess in your links and Google sees that a disallowed page has a high PageRank value, it may think the page is significant enough to place it in the index.
However, the index will only store a blank URL with no content information because the content hasn’t been crawled.
Why is “Indexed, though blocked by robots.txt” bad for SEO?
The “Indexed, though blocked by robots.txt” status is a serious problem. It may seem relatively benign, but it may sabotage your SEO in two significant ways.
Poor search appearance
If you blocked a given page by mistake, “Indexed, though blocked by robots.txt” doesn’t mean you got lucky, and Google corrected your error.
Pages that get indexed without crawling won’t look attractive when shown in search results. Google won’t be able to display:
- Title tag (instead, it will automatically generate a title from the URL or information provided by pages that link to your page),
- Meta description,
- Any additional information in the form of rich results.
Without those elements, users won’t know what to expect after entering the page and may choose competing websites, drastically lowering your CTR.
Here’s an example – one of Google’s own products:
Google Jamboard is blocked from crawling, but with nearly 20000 links from other websites (according to Ahrefs), Google still indexed it.
While the page ranks, it’s displayed without any additional information. That’s because Google couldn’t crawl it and collect any information to display. It only shows the URL and a basic title based on what Google found on the other websites that link to Jamboard.
To see if your page has the same problem and is “Indexed, though blocked by robots.txt,” o to your Google Search Console and check it in the URL Inspection Tool.
If you intentionally used the robots.txt Disallow directive for a given page, you don’t want users to find that page on Google. Let’s say, for example, you’re still working on that page’s content, and it’s not ready for public view.
But if the page gets indexed, users may be able to find it, enter it, and form a negative opinion about your website.
How to fix “Indexed, though blocked by robots.txt?”
You can find the “Indexed, though blocked by robots.txt” status at the bottom of the Page Indexing report in your Google Search Console.
There you may see the “Improve Search appearance” table.
After clicking on the status, you will see a list of affected URLs and a chart showing how their number has changed over time.
The list can be filtered by URL or URL path. When you have a lot of URLs affected by this problem, and you only want to look at some parts of your website, use the pyramid symbol on the right side.
Before you start troubleshooting, consider if the URLs in the list really should be indexed. Do they contain content that may be of value to your visitors?
When you want the page indexed
If the page was disallowed in robots.txt by mistake, you need to modify the file.
After removing the Disallow directive blocking the crawling of your URL, Googlebot will likely crawl it the next time it visits your website.
For detailed instructions on appropriately modifying the file, see our robots.txt guide.
When you want the page deindexed
If the page contains information you don’t want to show users visiting you via the search engine, you must indicate to Google that you don’t want the page to be indexed.
Robots.txt shouldn’t be used to control indexing. This file blocks Googlebot from crawling. Instead, use the noindex tag.
Google always respects ‘noindex’ when it finds it on a page. Using it, you can ensure Google won’t show your page in the search results.
You can find detailed instructions on implementing it on your pages in our noindex tag guide.
Remember that you need to let Google crawl your page in order to discover this HTML tag. It’s a part of the page’s content.
If you add the ‘noindex’ tag but keep the page blocked in robots.txt, Google won’t discover the tag. And the page will remain “Indexed, though blocked by robots.txt.”
When Google crawls the page and sees the ‘noindex’ tag, it will be dropped from the index. Google Search Console will display another indexing status when inspecting that URL.
Keep in mind that if you want to keep any page away from Google and its users, it’s always the safest choice to implement HTTP authentication on your server. That way, only the users who log in can access it. This is necessary if you want to protect sensitive data, for example.
When you need a long-term solution
The above solutions will help you remedy the “Indexed, though blocked by robots.txt” problem for a while. It’s possible, however, that it will appear in regard to other pages in the future.
Such status indicates that your website may need thorough internal linking or backlink audit improvement.
Here’s what you can do now:
- Contact us.
- Receive a personalized plan from us to deal with your internal linking issues.
- Overcome the messiness that keeps your website from growing.
Indexed, though blocked by robots.txt VS Blocked by robots.txt
The “Indexed, though blocked by robots.txt” status applies to URLs that weren’t crawled but were indexed. There’s a similar status in the Page indexing report, Blocked by robots.txt, which applies to pages that weren’t both crawled and indexed.
Let me again show you the table from the beginning to better outline this difference.
Blocked by robots.txt is usually less of an issue, whereas Indexed, though blocked by robots.txt should always be treated with high priority. However, if you want to take a closer look at the second status as well, you can check our article on Blocked by robots.txt.
- The Disallow directive in the robots.txt file blocks Google from crawling your page but not indexing it.
- Having pages that are both indexed and uncrawled is bad for your SEO.
- To fix Indexed, though blocked by robots.txt, you need to decide if affected pages should be visible on Search and then:
- Modify your robots.txt file,
- Use the noindex meta tag if necessary.
- The “Indexed, though blocked by robots.txt” status may be a sign of serious issues with your internal linking and backlink profile. Contact Onely to get your links optimized.