Indexing bugs are not unheard of. Google has been having problems with indexing for quite some time now. They can happen to anyone with no fault of the website owner, regardless of the website’s size. Just last year, there was a case of indexing bugs involving mobile indexing and canonicalization.
A few months ago, I experienced an indexing bug personally when it turned out my Ultimate Guide to Indexing SEO wasn’t indexed.
After thorough research, I found out that Google indexed the wrong version of the URL for no apparent reason. You can learn more about this particular bug in my article My Ultimate Guide to Indexing SEO Isn’t Indexed.
Earlier this year, I found another indexing bug, indicating that Google might be losing track of the URLs in the indexing queue.
Let’s break it down step by step.
Forgotten URL in Google’s indexing queue
On October 6th, we published an article: Rendering SEO: How Google Digests Your Content. The article was a transcript of a conversation between Bartosz Góralewicz from Onely, Martin Splitt from Google, and Jason Barnard from Kalicube.
Unfortunately, during the three weeks since the publication date, the article didn’t bring any traffic from Google.
I found it weird — another interesting article not indexed by Google? Does Google suffer from another indexing bug?
Since I strive to understand the ins and outs of Google’s indexing process, I decided to conduct a little investigation.
I checked what Google Search Console had to say about this URL.
GSC stated that this URL was “Discovered – currently not indexed.”
When you look into Google’s documentation, you’ll find the following explanation of the status:
Discovered – currently not indexed: The page was found by Google, but not crawled yet.source: Google
The status of the URL seemed highly improbable. I couldn’t believe Google didn’t crawl this page within three weeks after publication on a relatively small website.
So, I checked our server logs.
Server logs allow you to examine the traffic coming to your website. They contain information about each request, including its time and date, user-agent string, IP address, etc. Thanks to this information, I could see if (and when) Googlebot was on this page.
Surprisingly, I found that Googlebot visited the page the day we published the article!
At this point, I had two crucial pieces of information:
- The data from Google Search Console that Googlebot hadn’t visited the page yet wasn’t true. Server logs proved that Googlebot visited the URL on the day the article was published.
- It was not just a reporting bug from Google Search Console. The page wasn’t getting any organic traffic, so there were clearly more significant problems than just mistakes in the report.
More websites suffer from Google’s indexing bug
I wanted to know more about this bug and its scale, so I researched a larger sample of websites to draw actionable conclusions.
I collected server logs from four other websites and dug into the data.
It turned out that 100% of the websites I examined suffered from this very issue. There were multiple URLs visited by Googlebot, but wrongly classified by Google Search Console either as:
- Discovered – currently not indexed, or
In the case of the Unknown status, it seems like Google states it never visited the page and has no memory of even discovering the URL.
I discovered that the issue was present on one of the tested pages even 6 months after Google initially visited it. According to server logs, the last visit was on March 7th, but on October 27th, the status was still Unknown.
It seems like Google occasionally forgets about URLs at some point in the indexing pipeline. It’s unclear if the search engine is just losing track of some URLs or deliberately omitting them.
Either way, the consequences are severe. The forgotten pages don’t get any organic traffic.
A possible solution to the bug
Dan Shure shared an interesting case related to the forgotten URL bug.
Could "Discovered – but currently not indexed" put a URL in some sort of 'blacklist'?
Thought I'd share something strange and interesting that happened w/a few blog posts of a client..
(1/5) (I hate doing threads but this needs a little detail) 👇🏻
— Dan Shure (@dan_shure) November 8, 2021
It seems like changing the URL was enough to solve the problem.
Dan Shure wasn’t the only one who tested this solution. Frank Olivo got almost ⅓ of his articles indexed by changing their URLs!
This worked for around 12 of the 38 articles we tried it on. All indexed on the same day we republished. The remaining articles are still "discovered" almost a month later.
— Frank Olivo (@FrancoOlivo) December 7, 2021
It’s possible these URLs fell under patterns of low-quality URLs, so Google wasn’t crawling them and thus classified them as “Discovered – currently not indexed” in Google Search Console.
You might convince Google to treat the page as a new one and crawl it again by changing the URL. This solution might help get the page indexed, but it’s only a workaround. It doesn’t prevent the issue from happening again. Google should address the problem, and the bug should be fixed permanently.
Hope you found this article helpful. If you need more detailed guidance or advice for your website, you can use Onely’s technical SEO consulting.
As described in the article, there is a severe problem with indexing. It’s not as apparent and spectacular as previous indexing bugs (e.g., connected to canonicalization), but it can still negatively impact any website.
If you’re a Google employee and want to investigate the problem, I can share some sample URLs that suffered from this problem.
Did you notice this bug or a similar indexing bug on your site? Let me know!