“Blocked by robots.txt” is a Google Search Console status. It indicates that Google didn’t crawl your URL because you blocked it with a Disallow directive in robots.txt. It also means that the URL wasn’t indexed.
Fixing this issue lies at the heart of creating a healthy crawling and indexing strategy for your website.
How to fix “Blocked by robots.txt”
Addressing this issue requires a different approach based on whether you blocked your page by mistake or on purpose.
Let me guide you on how to act in these two situations:
When you used the Disallow directive by mistake
In this case, if you want to fix “Blocked by robots.txt”, remove the Disallow directive blocking the crawling of a given page.
Thanks to that, Googlebot will likely crawl your URL the next time it crawls your website. Without further issues with that URL, Google will also index it.
If you have many URLs affected by this issue, try filtering them out in GSC. Click on the status and navigate the inverted pyramid symbol above the URL list.
You can filter all affected pages by URL (or only part of a URL path) and the last crawl date.
If you see “Blocked by robots.txt”, it may also indicate that you have intentionally blocked a whole directory but unintentionally included a page you want to get crawled. To troubleshoot this:
- Include as many URL path fragments in your Disallow directive as you can to avoid potential mistakes, or
- Use the Allow directive if you want to allow bots to crawl a specific URL within a disallowed directory.
When modifying your robots.txt, I suggest you validate your directives using the robots.txt Tester in Google Search Console. The tool downloads the robots.txt file for your website and helps you check if your robots.txt file is correctly blocking access to given URLs.
The robots.txt Tester also enables you to check how your directives influence a specific URL on the domain for a given User-agent, e.g., Googlebot. Thanks to that, you can experiment with applying different directives and see if the URL is blocked or accepted.
Although, you need to remember that the tool won’t automatically change your robots.txt file. Therefore, when you finish testing the directives, you need to implement all the changes manually to your file.
Additionally, I recommend using the Robots Exclusion Checker extension in Google Chrome. When browsing any domain, the tool lets you discover pages blocked by robots.txt. It works in real-time, so it will help you react quickly to check and work on the blocked URLs on your domain.
Check out my Twitter thread to see how I use this tool above.
What if you keep blocking your valuable pages in robots.txt? You may significantly harm your visibility in search results.
When you used the Disallow directive on purpose
You can ignore the “Blocked by robots.txt” status in Google Search Console as long as you aren’t disallowing any valuable URLs in your robots.txt file.
Remember that blocking bots from crawling your low-quality or duplicate content is perfectly normal.
And deciding which pages bots should and shouldn’t crawl is crucial to:
- Create a crawling strategy for your website, and
- Significantly help you optimize and save your crawl budget.
Here’s what you can do now:
- Contact us.
- Receive a personalized plan from us to deal with your issues.
- Unlock your website’s crawling potential!
Still unsure of dropping us a line? Reach out for crawl budget optimization services to improve the crawling of your website.
“Blocked by robots.txt” vs. “Indexed, though blocked by robots.txt”
The difference between these two issues is that with “Blocked by robots.txt” your URL won’t appear on Google. In turn, with “Indexed, though blocked by robots.txt” you can see your URL in the search results.
Why may Google want to index your blocked URL? Because when many links point to a particular URL with descriptive anchor text, Google may consider it important enough to be indexed without getting crawled.
Also, to find “Blocked by robots.txt”, head to the ‘Why pages aren’t indexed’ table below the chart in the Page indexing report.
In turn, “Indexed, though blocked by robots.txt”, is part of the ‘Improve page appearance’ section that you may see beneath the ‘Why pages aren’t indexed’ table.
Remember that the Disallow directive in robots.txt only prevents Google from crawling your pages. It can’t and shouldn’t be used to control indexing. To prevent Google from indexing a URL, make sure you’re blocking its indexing with the noindex tag.
Learn the basics of indexing!
The “Blocked by robots.txt” status means that Googlebot found a Disallow directive applying to that URL in your robots.txt.
Remember that it’s normal to prevent Googlebot from crawling some URLs, especially as your website gets bigger. The decision on what pages should and shouldn’t be crawled on your website is a fixed step of creating a sound indexing strategy for your website.
And when getting your crawling and indexing right is the foundation of SEO, a well-organized robots.txt file is just one part of it.
Contact us for a thorough technical SEO audit to navigate your issues.