Why Google Cached Pages Won’t Tell You Anything About Indexed Content
Google Cache is an extremely useful feature. Simply speaking, it is a copy of a website created when search engine robots accessed it last. I believe it is safe to say that most internet users had a chance to benefit from the cache when the website they were trying to view was down for some reason. Also, many SEO experts strongly depend on the cache while trying to detect possible indexation problems. However, it is not meant to be used for such purposes. Keep reading to find out why.
How to access Google Cache
The cache is accessible in two ways:
By clicking on the green arrow next to the URL in a search engine result page and then choosing the “Cached” option:
By adding the “cache:” string at the beginning of the URL that you see in your browser:
You can also find some Chrome plugins that will allow you to see the cached version of a page within a click (Web Cache Viewer, for example).
I can’t see the cache of my website. Is this bad?
Note that there is a directive for meta tags that can exclude pages on your website from being cached:
If you can’t access the cached version of your website make sure there is no content=“noarchive” attribute in its source code. Of course, when the page itself is non-indexable or blocked from crawling in the robots.txt file, you shouldn’t expect it to be cached either. Also, if a particular page is new and has just been indexed, it may take a while for the cache to be available. Just be patient and give it a few extra hours to show up.
The important information here is that not having your website in Google Cache shouldn’t negatively affect its online visibility. Some tests regarding the cache removal using noarchive have already been made and no harm was observed, neither for the rankings nor for the traffic. Actually, some would say that disabling the cache will give you more control of what the users will see on your website as they won’t be able to access the outdated version.
But why is it similar? We’re getting to the bottom of the case.
What exactly does Google Cache show you?
The cached version of a website is not an exact copy. Actually, it is much less than that. Let’s see what John Mueller, Google’s Webmaster Trends Analyst, has to say about it:
We can find even more details on this topic in two of the Google Hangouts (they are fairly up-to-date):
20.09.2016 – 46:50
26.08.2016 – 11:29
In general, the cache is not always representative of what we have actually indexed for a page. So that’s something to kind of keep in mind. That’s not really where you need to go to double check that we’ve actually indexed this content. What I might do instead is to do something like a site: query for that URL or for some keywords that you’ve changed, you’ve modified on that page, to see that we can actually pick that content up properly.
So, what we can see in the cache is just the part of a website which is written in plain HTML. By having a look at the source code we can also see that all the resources are served from the original URL. Therefore, in the case of the page being down, they are accessible to a limited extent.
However, when you go to the text-only version of the cache, the content is gone:
How does Googlebot see my website?
You can have final confirmation whether the content was indexed by using the following query:
site:example.com “This is an example fragment of the content I want to verify the indexation status for.”
If something shows up in the SERP, then you can be sure the content is properly seen by Googlebot.
There are cases which prove that sometimes what you see in the cache is completely different from what will be indexed eventually. Let’s take a look at Hulu’s homepage in Google Cache:
As you can see, the cache itself shows the search results for the query “cache:hulu.com”. At the same time, the result for “hulu” in Google looks nothing but normal:
Here’s what probably happened. Hulu is using the same string in URL for internal search queries as the one generated in Google Cache:
What else can I use the cache for?
There is one more situation when you could find the cache helpful. It often happens that you need to see when a particular page was last visited by Googlebot, but you don’t necessarily have access to server logs. In the past, Google Cache showed a date of the last time the content of a page was successfully fetched. This means that if this page returned a 304 status code, telling Googlebot that the content hadn’t been updated, the date in the cache remained unchanged.
What many people might not know is that as of September 2006 Google Cache shows the date of the last time a page was visited (regardless if the content was fetched or not). Therefore, in this case, you can rely on the cache simply by taking a look at the heading: