In this article, I will guide you through why it’s happening and how to fix it. You’ll learn:
I will provide tons of additional tips and recommendations, too. We have a lot to cover, so get yourself a cup of coffee (or two) and let’s get started.
For instance, it’s used by Forex and CFD trading platforms to continually update the exchange rates in real-time.
- HTML defines the actual content of a page (body/frame of a car).
- CSS defines the look of the page (colors, style).
- Internal links
- Top products
- Main content (rarely)
1. Use WWJD
Simply go to WWJD and type the URL of your website into the console.
2. Use a browser plugin
Important: “view source” is not enough when auditing JS websites
HTML is a file that represents just the raw information used by the browser to parse the page. It contains some markup representing paragraphs, images, links, and references to JS and CSS files.
You can see the initial HTML of your page by simply right-clicking -> View page source.
Here’s how I would describe the difference between the initial HTML and the DOM:
- The initial HTML (right-click -> View page source) is just a cooking recipe. It provides information about what ingredients you should use to bake a cake. It contains a set of instructions. But it’s not the actual cake.
- DOM (right-click -> inspect element) is the actual cake. In the beginning, it’s just a recipe (an HTML document) and then, after some time it gains a form and then it’s baked (page fully loaded).
Note: If Google can’t fully render your page, it can still index just the initial HTML (which doesn’t contain dynamically updated content).
As a leading technical SEO agency, we’re constantly doing research to look for Google’s strengths and weaknesses.
That’s one out of four times.
Here are some examples of the tested websites:
On the other hand, some of the websites we tested did very well:
In the case of crawling traditional HTML websites, everything is easy and straightforward, and the whole process is lightning fast:
- Googlebot downloads an HTML file.
- Googlebot extracts the links from the source code and can visit them simultaneously.
- Googlebot downloads the CSS files.
- Googlebot sends all the downloaded resources to Google’s Indexer (Caffeine).
- The indexer (Caffeine) indexes the page.
- Googlebot downloads an HTML file.
- Googlebot downloads the CSS and JS files.
- WRS fetches the data from external APIs, from the database, etc.
- The indexer can index the content.
- Google can discover new links and add them to the Googlebot’s crawling queue. In the case of the HTML website, that’s the second step.
II. Googlebot Doesn’t Act Like a Real Browser
It’s time to go deeper into the topic of the Web Rendering Service.
As you may know, Googlebot is based on the newest version of Chrome. That means that Googlebot is using the current version of the browser for rendering pages. But it’s not exactly the same.
Googlebot visits web pages just like a user would when using a browser. However, Googlebot is not a typical Chrome browser.
- Googlebot declines user permission requests (i.e. Googlebot will deny video auto-play requests).
- Cookies, local, and session storage are cleared across page loads. If your content relies on cookies or other stored data, Google won’t pick it up.
- Browsers always download all the resources – Googlebot may choose not to.
When you surf the internet, your browser (Chrome, Firefox, Opera, whatever) downloads all the resources (such as images, scripts, stylesheets) that a website consists of and puts it all together for you.
However, since Googlebot acts differently than your browser, its purpose is to crawl the entire internet and grab valuable resources.
The World Wide Web is huge though, so Google optimizes its crawlers for performance. This is why Googlebot sometimes doesn’t load all the resources from the server. Not only that, Googlebot doesn’t even visit all the pages that it encounters.
Google’s algorithms try to detect if a given resource is necessary from a rendering point of view. If it isn’t, it may not be fetched by Googlebot. Google warns webmasters about this in the official documentation.
Googlebot and its Web Rendering Service (WRS) component continuously analyze and identify resources that don’t contribute essential page content and may not fetch such resources.source: Google's Official Documentation
Additionally, as confirmed by Martin Splitt, a Webmaster Trends Analyst at Google, Google might decide that a page doesn’t change much after rendering (after executing JS) so they won’t render it in the future.
If your content requires Google to click, scroll, or perform any other action in order for it to appear, it won‘t be indexed.
Last but not least: Google’s renderer has timeouts. If it takes too long to render your script, Google may simply skip it.
There are three factors at play here:
1) crawlability (Google should be able to crawl your website with a proper structure and discover all the valuable resources);
2) renderability (Google should be able to render your website);
3) crawl budget (how much time it will take for Google to crawl and render your website).
I. Check if Google can technically render your website.
Inspect the screenshot and ask yourself the following questions:
- Is the main content visible?
- Can Google access areas like similar articles and products?
- Can Google see other crucial elements of your page?
If you want to dive deeper, you can also take a look at the HTML tab within the generated report.
Here, you can see the DOM – the rendered code, which represents the state of your page after rendering.
What if Google cannot render your page properly?
It may happen that Google renders your page in an unexpected way.
Looking at the image above, you can see that there’s a significant difference between how the page looks to the user compared to how Google renders it.
There are a few possible reasons for that:
- Google encountered timeouts while rendering.
- Some errors occurred while rendering.
Important note: making sure Google can properly render your website is a necessity.
However, it doesn’t guarantee your content will be indexed. Which brings us to the second point.
II. Check if your content is indexed in Google.
- Using the “site” command – the quickest method.
- Checking Google Search Console – the most accurate method.
The “Site” Command
In 2020, one of the best options for checking if your content is indexed by Google is the “site” command. You can do it in two simple steps.
1. Check if the page itself is in Google’s index.
First, you have to ensure that the URL itself is in Google’s index. To do that, just type “site:URL” in Google (where the URL is the URL address of a page you want to check).
Now, when you know that the URL is in fact in Google’s database, you can:
If a snippet with your fragment shows up, that means your content is indexed in Google.
I encourage you to check the “site” command across various types of JS-generated content.
My personal recommendation: perform a “site:” query with a fragment in incognito mode.
Google Search Console
A more precise, albeit slower, method of checking if your content is indexed by Google is using the Google Search Console.
Type the URL in question into the URL Inspection Tool.
Then click View crawled page. This will show you the code of your page that is indexed in Google.
I recommend repeating this process for a random sample of URLs to see if Google properly indexed your content. Don’t stop at just one page; check a reasonable number of pages.
- Google encounters timeouts. Are you sure you aren’t forcing Googlebot and users to wait many seconds until they are able to see the content?
- Google had rendering issues. Did you check the URL Inspection tool to see if Google can render it?
- Google decided the content is of low quality.
- Google simply wasn’t able to discover this page. Are you sure it’s accessible via the sitemap and the internal structure?
There are several ways of serving your web pages to both users and search engines.
What’s right for your website: Client-side rendering (CSR), Server-side rendering (SSR), or perhaps something more complex? In this chapter, we’ll make sure you know which solution suits your needs.
Remember our baking analogy? It’s valid here as well:
- Client side-rendering is like a cooking recipe. Google gets the cake recipe that needs to be baked and collected.
- Server-side rendering – Google gets the cake ready to consume. No need for baking.
On the web, you’ll see a mix of these two approaches.
1. Server-Side Rendering
There is one problem though: a lot of developers struggle with implementing server-side rendering (however, the situation is getting better and better!).
There are some tools that can make implementing SSR faster:
My tip for developers: If you want your website to be server-side rendered, you should avoid using functions operating directly in the DOM. Let me quote Wassim Chegham, a developer expert at Google: “One of THE MOST IMPORTANT best practices I’d recommend following is: Never touch the DOM.”
2. Dynamic rendering
Another viable solution is called dynamic rendering.
Dynamic rendering is an approach officially supported by Google.
You can use these tools/services to implement dynamic rendering on your website:
Google also provides a handy guide explaining how to successfully implement dynamic rendering.
As of 2020, Google recommends using dynamic rendering in two cases:
- For indexable JS-generated content that changes rapidly.
- Content that used JS features that aren’t supported by crawlers.
What does that mean for you?
You must include Twitter Cards, as well as Facebook Open Graph markup in the initial HTML. Otherwise, when people share your content on social media, it won’t be properly displayed.
Let’s see how links to Angular.io and Vue.js look when you share them on Twitter:
Would you click on that link? Probably not.
Now contrast that with a link to Vue.js – the Twitter card looks much better with the custom image and an informative description!
Takeaway: If you care about traffic from social media, make sure that you place the Twitter card and Facebook Open Graph markup in the initial HTML!
Common Pitfalls with JS Websites
BLOCKING JS AND CSS FILES FOR GOOGLEBOT
IMPLEMENT PAGINATION CORRECTLY
Many popular websites use pagination as a way of fragmenting large amounts of content. Unfortunately, it’s very common that these websites only allow Googlebot to visit the first page of pagination.
As a result, Google isn’t able to easily discover large amounts of valuable URLs.
For example, when it comes to eCommerce websites, Googlebot would only be able to reach 20-30 products per category on paginated category pages.
As a consequence, Googlebot most likely cannot access all the product pages.
How does this happen?
Many websites improperly implement pagination by not using a proper <a href> link. Instead, they use pagination that depends on a user action – a click.
In other words, Googlebot would have to click on a button (View more items) to get to the consecutive pages.
Unfortunately, Googlebot doesn’t scroll or click the buttons. The only way to let Google see the second page of pagination is to use proper <a href> links.
If you still are not sure if Google can pick up your links, check out this slide from the Google I/O conference in 2018:
Having links hidden under link rel=”next” doesn’t help either. Google announced in March 2019 that they no longer use this markup:
As we evaluated our indexing signals, we decided to retire rel=prev/next.
Studies show that users love single-page content, aim for that when possible, but multi-part is also fine for Google Search. Know and do what's best for *your* users! #springiscoming pic.twitter.com/hCODPoKgKp
— Google Webmasters (@googlewmc) March 21, 2019
To sum up, ALWAYS USE PROPER LINKS!
USING HASHES IN URLs
- Bad URL: example.com/#/crisis-center/
- Bad URL: example.com#crisis-center
- Good URL: example.com/crisis-center/
You may think that a single additional character in the URL can’t do you any harm. On the contrary, it can be very damaging.
For us, if we see the kind of a hash there, then that means the rest there is probably irrelevant. For the most part, we will drop that when we try to index the content (…). When you want to make that content actually visible in search, it’s important that you use the more static-looking URLs.
source: John Mueller
That’s why you need to remember to always make sure your URL doesn’t look like this: example.com/resource#dsfsd
Angular 1 uses hashtag-based URLs by default so be careful if your website is built using that framework! You can fix it by configuring $locationProvider (here is a tutorial on how to do that!). Fortunately, the newer versions of the Angular framework use Google-friendly URLs by default.
FAQ - BONUS CHAPTER!
As we wind down here, you probably have a few questions.
And that’s great!
But there is one small catch…
Want to read CHAPTER 6: FAQ?
As we’ve reached the end of the article, I want to take a moment and address a problem that could affect even the best SEOs.
Here are some other takeaways:
- Google’s rendering service is based on the most recent version of Chrome. However, it has many limitations (may not fetch all the resources, some features are disabled). Google algorithms try to detect if a resource is necessary from a rendering point of view. If not, it probably won’t be fetched by Googlebot.
- Usually, it’s not enough to analyze just the source page (HTML) of your website. Instead, you should analyze the DOM (Right-click -> Inspect tool).
- If Google can’t render your page, it can pick up the raw HTML for indexing. This can break your Single-Page Application (SPA), because Google may index a blank page with no actual content.