Rendering_SEO_Manifesto_correct - manifesto-tytulowy-1600x900-1

Rendering SEO manifesto – why JavaScript SEO is not enough

25 Jun 2020

 

Let’s face some hard truths first.

If your content isn’t ultimately better than the alternatives, Google may choose not to index it. To make matters worse, we now know that you can get kicked out of Google’s index over time. We have hard data showing how common this problem is.

On average, 10-15% of your pages are not indexed by Google. Moreover, if your content relies on JavaScript (and it most likely does, even if you’re using WordPress, Shopify, or similar platforms), the risk goes up: 40% of an average website’s JavaScript-powered content is not indexed.

This article will guide you through this problem, all the elements that build into it, and of course, I will show you how to diagnose and fix it.

The weight of code

Over the last few years, there has been an outburst of new web technologies. While we all appreciate their functionalities, we rarely think about the cost of this ongoing revolution. 

In 2011, a median weight of JavaScript code served by a mobile version of a website was around 50KB. Today, it is over 400KB. This is a 750% spike in the weight of code!

However, this is not the key problem. 

Script dependencies

In 2011, it was rare for a website to rely on JavaScript execution to serve content. To make it simpler – to a robot, a website would look more or less the same with and without JavaScript. 

Around 2015, we started seeing websites that would fully rely on JavaScript (e.g. the famous hulu.com case), and today, most of the popular websites don’t work well without JavaScript. This means that Google most likely didn’t (or, at least, didn’t need to) execute ANY script before ~2015 to see the website’s content. This has now changed.

The cost goes up. 20 times

All of a sudden, Google not only has to render your website to see the layout (process your code to see how your content is organized), now they need to render JavaScript as well. 

This raises the cost of crawling (and, eventually, indexing) your pages 20 times.

Rendering_SEO_Manifesto_correct - 1-1600x900-1

This cost is measured in resources needed to crawl, fetch, and render those increasingly heavy pages. It’s a gamechanger for both Google and SEOs. Search engines can’t afford the cost of processing everything we throw at them. 

Looking for the solution

If you want to launch a successful website that relies on JavaScript, you will most likely go through trial and error. There are no tools or documentation that will allow you to assess with full confidence that your new website will be properly rendered and fully indexed. The only thing you can do is to follow the best practices and hope for the best. 

Over the last few years, we’ve conducted research on ways to diagnose and fix JavaScript rendering problems. We were looking for a repeatable, quantifiable metric that would precisely measure your potential rendering problems and guide you towards fixing it without hundreds of hours of work.

This turned out to be harder than we hoped, even though we didn’t expect it to be easy. 

Googlers were extremely helpful with all our questions about JavaScript with one exception. They wouldn’t give us anything precise when we asked about why some websites get rendered and some not. Except for some obvious things to avoid, there was no clear definition or framework to predict if a website will struggle with rendering. 

Throughout 2018 and a good part of 2019, we created 20+ experiments testing different theories to find the “sweet spot” for JavaScript-powered websites.

Rendering_SEO_Manifesto_correct - 2-1600x900-1

In mid-2019, we realized that we’re not getting closer to the solution. Each experiment yielded interesting results, but none would give us a final answer. 

We concluded that if we want to get to the bottom of the JavaScript SEO issues, we need to go big. Real websites, real data, and a timeline tracking how Google’s “limits” are shifting.

That’s why we created our JavaScript SEO toolset and in September 2019, we started gathering data about Google’s indexing for both HTML and JavaScript. This is the first and only tool that shows the scale of the problem. 

The results were beyond what we expected.

Findings we didn’t expect

Popular brands struggle to fully get into Google’s index

Rendering_SEO_Manifesto_correct - 3-1600x900-1

This list of brands goes on and on. Our database is filled with hundreds of the world’s most popular eCommerce websites, publishers, SaaS tools, and all sorts of companies that are struggling to get their (indexable) pages indexed in Google. 

Indexing your content is getting harder

This is not a temporary problem, even though anecdotal evidence may make you think otherwise.

Indexing your content is not a given anymore – you need to earn your place in Google’s index. And it seems to be getting harder over time.

With 9+ months worth of data, we can see the trend is negative. In other words, Google is indexing less and less content every month.

Indexing trends fluctuate during Google updates

Some of the Google updates seem to have influence over the indexing trends we are observing. Google May Core Update 2020 is one such case. 

The SEO community wrote tens if not hundreds of articles trying to understand what happened in May. I briefly went through some of them and as usual, there are many hypotheses based on various SEO aspects ranging from backlinks to E-A-T. 

While we probably will never know exactly what was targeted by this update, thanks to TGIF, we can shed a bit more light on Google’s May Core Update.

Barry Schwartz recently reported that “Googlers are saying, it might not be worth Google’s time to index those pages.” Well, this is probably true, but let’s see if our data reflects that.

Rendering_SEO_Manifesto_correct - 4-1600x900-1

It is clear that JavaScript indexing improved in May. Google is currently indexing content that depends on JavaScript more eagerly than before. 

However, if we learned something from observing indexing trends, Google has limited resources, and nothing happens in a vacuum.

Rendering_SEO_Manifesto_correct - 5-1600x900-1

Looking at the screenshot above, we can clearly see that the increase in JavaScript indexing came with a price. HTML indexing declined and this is why so many webmasters reported seeing a decline in the number of their pages indexed. 

Rendering_SEO_Manifesto_correct - 6-500x330-1

Google’s index selection

Based on Google’s patents, let’s go through some of the basics of index selection. 

  1. There is a line of URLs trying to get into Google’s index.
  2. Google has limited resources.
  3. Depending on the availability of these resources, Google may establish a threshold of URL quality and let in less attractive content into the index when there are more resources available. 
  4. When Google’s resources are lower, your content may be removed from the index to make room for higher quality content.

As you can see, the quality of your content is essential, which is nothing new. However, it may be surprising to you how much that quality is determined by how your page is rendered.

Rendering – a search engine’s perspective

Statement

All the insights shown below are based on our research, our tools gathering indexing and rendering data daily and documentation we’ve managed to find online (Google’s statements, patents submitted by Google, Google’s presentations and announcements and everything we could find to get more insights into how rendering works at Google). 

This article is a summary of our research and findings of our subjective interpretation of how Google approaches rendering. We will be updating this article with new findings as our research progresses.

Rendering is essential for Google and other search engines to see and understand our website’s content and layout. Without rendering, your content doesn’t exist online. We are way past the times when you could see your content by simply looking into the HTML code of the website.

At the same time, this process is the most expensive part of the indexing pipeline. 

Let’s dive into how Google is optimizing the cost of rendering on their end. 

An intro to batch-optimized rendering and fetch architecture (BOR)

Rendering_SEO_Manifesto_correct - 7-1600x900-1

First of all – search engines are looking at your pages from a completely different perspective. They don’t care about a lot of elements that are solely focused on real users’ browsing experience.

Rendering_SEO_Manifesto_correct - 8-1600x900-1

BOR is removing all elements that are non-essential to generate your website’s layout. These include:

  • Tracking scripts (Google Analytics, Hotjar, etc.)
  • Ads
  • Images (BOR is using mock images to generate the page’s layout)

After removing all those unnecessary elements, BOR is setting a value on the Virtual Clock (which we’ll talk about later).

Rendering_SEO_Manifesto_correct - 9-1600x900-1

The final step of this process is when the time on the Virtual Clock runs out and the website’s layout is generated. This is when things get both complex and exciting. 

Rendering_SEO_Manifesto_correct - 10-1600x900-1

Use this information to rank better

First, you need to understand the two key concepts of the Batch-Optimized Rendering architecture: the Virtual Clock and the Layout generated after the time on the Virtual Clock runs out.

Let’s start with the Virtual Clock. While researching this topic over the last few months we were shocked to find that it was never covered before apart from what we found in Google’s patents. 

Virtual Clock is not a well-covered topic

What’s a Virtual Clock?

Let’s start with the simplest possible definition and work our way down. Virtual Clock is measuring the time spent for rendering your layout. It’s not advancing while waiting for resources (scripts, CSS files, image dimensions, etc.). 

Rendering_SEO_Manifesto_correct - 12-1600x900-1

This leads us to the conclusion that if your website is heavy with a lot of JavaScript/CSS to render – you need more time on that Virtual Clock. Will you be granted that time? I guess you already know what the answer is 🙂

There is a limit of how much (how long) BOR will render. Let’s investigate it.

Where is the limit of batch-optimized rendering?

According to Google’s patents, BOR sets the Virtual Clock to an unknown number of seconds. Exactly how many seconds are on the Virtual Clock is not relevant, as for this value to be actionable for us, we would also need to know exactly how much resources BOR is assigning to each rendering thread, and a few other specific metrics that we will never get. 

However, we can still work with the data we already have. 

Knowing that the main factor limiting rendering is computing power, we can see how much your website relies on computing power to render. However, just like for web performance, we don’t have a precise score or a “goal” here. It is a moving target. We are aiming to be faster, lighter, better, and more efficient than the competition. 

Statement

This is not a detailed walkthrough/analysis required to fully understand your rendering issues, but this is the first step to a better understanding of how much your website relies on computing power to fully render.

There are hundreds of ways to measure how much our layout relies on computing power. I’ll go through two of them: a quicker, easier way to diagnose problems on the spot (less precise), and a more complex method giving us a better understanding of how our website’s layout is generated.

Measuring the rendering cost – TL;DR

Go to www.onely.com/tools/tldr/ (TLDR = Too Long; Didn’t Render) and see how much computing power is needed to render your website on a scale from 0 to 100. 

Measure the cost of rendering your page using Too Long; Didn't Render

Anything below 20 – 30 points is OK. The results you’ll see in TL;DR are not very precise but they can give you a ballpark figure of the scale of the problem for your pages. 

If you score high – this means that you need to investigate a bit more. If you score low – a page you are testing is most likely not going to struggle with rendering problems. 

Measuring the rendering cost – Chrome Developer Tools

To measure your rendering cost more accurately, you need to remove all the unnecessary scripts, ads, and images from your page.

Use Chrome Developer Tools to remove scripts which aren't necessary for BOR

To do so, follow the steps from this video walkthrough I recorded and see how rendering with less computing resources affects your page. 

Rendering_SEO_Manifesto_correct - 15-1600x900-1

Looking at the example above, we can see that throttling your computer’s processor 6 times increased all of the components of the load time quite a bit. The duration of the rendering process was increased from less than 400ms to almost 10 seconds. This is roughly a 25x increase. 

This shows us that the page I tested requires a lot of time on the Virtual Clock. Probably more than it is going to get.

Once we run out of time on the Virtual Clock, we enter a new, exciting, and very complex world of rendering and layout. 

Rendering_SEO_Manifesto_correct - 16-1600x900-1

Rendering SEO and the layout of your page

Understanding how and why Google is looking at your website’s layout is the key to finding the missing link between crawling, rendering, and indexing. Those three elements are interconnected. Finding and unblocking bottlenecks between them has more potential than any other part of SEO.

Why layout matters for technical SEO

We knew that the content location matters for quite some time. As SEOs, we had an overall idea that Google is looking at content’s placement. We would advise our clients not to put too many ads above the fold and we would talk about links on top of the page being more “valuable” than those below the fold. 

Rendering_SEO_Manifesto_correct - 18-1600x900-1

What I want to share with you today is a bit more actionable and based on both Google’s documentation and patents, but also on more than 9 months worth of detailed data into how Google is indexing content and which parts of the layout seem to be picked out by Google more eagerly than others.

Google’s patent on “Scheduling resource crawls” from 2011 is full of valuable information showing how Google is assigning different levels of prominence to different sections within the rendered layout.

Rendering_SEO_Manifesto_correct - 19-1600x900-1

When I started to understand Google’s logic around creating that queue of URLs to be crawled, it became clear to me that JavaScript SEO is merely a part of the problem. JavaScript SEO was mostly focused on IF Google will be able to see our content. This is important but it’s just the tip of the iceberg. 

Rendering SEO is a brand new territory with dozens of different aspects.

The Google Indexing Forecast (TGIF) dataset

Around May 2019, when we already knew that layout matters, we sat down with Tomek and went through all possible repercussions of that. We started to wonder if all the possible partial indexing problems we are seeing are JavaScript SEO related. In other words, we wanted to try and find out if the cost of JavaScript is the main reason why a page is only partially indexed.

Our best hypothesis was that maybe it’s not JavaScript, but instead, Google is picking parts of the layout that are more important so they can save resources otherwise spent scripting and rendering just to see the sections they are not interested in. 

To even start analyzing this problem on a large scale, we had to build a database of websites with partial indexing issues. After a few months of intense work, TGIF (The Google Indexing Forecast) toolset was released during my SEOktoberfest 2019 talk. 

How TGIF works

First of all, we picked a few hundreds of the most popular domains and started tracking their indexing delays (based on newly added URLs from their sitemaps). Now that we had information about when these URLs are indexed, we were able to start checking if different elements of the layout are indexed. 

This part turned out to be one of the most shocking data points within our database. It helped us understand much more about how partial indexing works and which parts of your website’s layout are more important than others. 

We already knew that Google is assigning different values to internal links based on their location and attributes. This leads us to believe that Google has a good understanding of which parts of the website’s layout are more important than others.

Rendering_SEO_Manifesto_correct - 21-1600x900-1

What we realized when looking at the data gathered in TGIF is that there is a good chance that Google is using similar logic when deciding on which part of your website’s layout should be rendered and indexed. Going through all the partially indexed pages within our database, we found that: 

  1. Google seems to be prioritizing the indexing of the “main content” of the page’s layout
  2. Some elements are being skipped by Google more often than others. Those elements being product carousels underneath the main content like “you may also be interested in”, “related items” etc. 
Rendering_SEO_Manifesto_correct - 23-1600x900-1

Partial indexing is more serious than it looks. It creates a vicious circle of crawling, rendering, and indexing problems.

In turn, these issues affect the whole website’s link graph and crawl budget. The same websites that are struggling with partial indexing also struggle with getting their URLs indexed. 

How to address partial indexing

There are two main problems to address when solving partial indexing problems.

1. Your website’s CPU consumption

Google openly said that it will interrupt scripts. I assume this is Google’s way of explaining how the Virtual Clock works. 

Rendering_SEO_Manifesto_correct - 25-1600x900-1

And this is when things get interesting again. Looking into our dataset, we are seeing clearly that layout elements that are often not indexed by Google are far from random. 

Rendering_SEO_Manifesto_correct - 26-1600x900-1

Looking directly into Google’s statements and patents, they imply that rendering will be interrupted regardless of the different parts of the layout. However, based on our findings, some parts of the layout seem to be at a higher risk, so we would assume that Google’s heuristics might pick and choose which scripts to execute. 

This brings us to the second problem that you need to address.

2. Understanding our website’s layout elements and their dependencies on different scripts

Now to make it actionable, we need to clearly understand why Google decides not to index specific parts of our layout. There are multiple scenarios to consider here. Just to explain the most high-level approach to this problem, let’s have a look at one possible scenario. 

Example:

My product name, images, and description are all indexed by Google, however, product reviews, user-submitted images, related products from the same collection (product carousel) and the social media feed below is not being picked up by Google. 

Diagnostics path

Before we are gonna jump into the technical aspects, let me state something really important first. Rendering SEO is an important step, but it is not the best idea to be looking into rendering issues before addressing the following aspects first. 

  • Information architecture

If I were to choose the most important and the most underappreciated part of SEO, creating a clear information architecture would be my first choice. No questions asked. If your information architecture is not spotless, search engines will struggle to find and rank proper pages within your structure.

  • Indexing strategy

If your website is not lean and Google is crawling thousands of URLs that are not valuable and indexable, you will struggle with indexing. The best solution is to crawl your whole website and look at the number of pages crawled vs. the number of unique, indexable, and valuable pages within your website. In 2020, wasting the crawl budget is not an option. Avoid relying on URL canonicalization and no-index tags. Those still require Googlebot to visit those URLs. In a perfect scenario, some 95% of Google-facing URLs should be unique, valuable, and indexable. Obviously, this is not always possible, but aim for it.

  • Crawl budget optimization

This last point is fairly obvious after looking at the two points above. Make sure to regularly look into your server logs, crawls etc. to make sure that Googlebot is only visiting the pages you want. The quicker you spot anomalies and problems, the safer your rankings are. 

Now that we have this out of our way, let’s dive into our rendering’s diagnostic path.

Step 1

If Google is skipping a part of your page’s layout during rendering and indexing, the first thing to do is to go through the JavaScript SEO basics and see if your scripts are not e.g. blocked in robots.txt, if all the elements are crawlable and indexable for bots.

During this process, you’ll need a deep understanding of the search engine’s limitations (e.g. Googlebot doesn’t scroll, doesn’t click, etc.) and of course, you need decent technical SEO knowledge. Misdiagnosing rendering problems during this step may lead to unnecessary and expensive code changes. 

Step 2

Once you establish that your content is in fact fully accessible and indexable for Google, it is time to start looking into the specific scripts that are not being executed by Google. This is a broad topic, but let me just mention some areas to investigate.

The cost of your website’s scripting and rendering. 

  • Does your web development team have web performance budgets set? If they do, are they within those budgets? How are those budgets set?
  • Do you actually need that much script executed just to generate your product page’s layout? Maybe some of your scripts have features that may be only served to users and not search engines?
  • Are you optimizing your critical rendering path? Is your website benefiting from progressive rendering? 

How is your code structured throughout the website?

  • Do you use one, the same, large JS/CSS file for every page within your website?
  • How much code is not necessary? Getting familiar with the Coverage tab in the Chrome Dev Tools is definitely a good idea. Optimizing CSS/JS is definitely a lot of development work, but this is definitely worth the effort. 
Do not use the same JS and CSS files on all your pages when they aren't using those files fully

Is the content that’s not indexed relevant to the main content of your page?

  • If you are using a section of your layout to do “PageRank sculpting” or as a solution to your poor information architecture or internal link structure, Google may start ignoring it.
  • Is your main content/product page about running shoes and your “related items” section is full of digital cameras? Martin Splitt from Google recently confirmed that if some part of your page is not “supporting” your main content, Google may also skip indexing it. 

Is your main content visible without JavaScript? 

  • If your main content is visible without JavaScript processing, and “extra” elements of your page require a lot of computing power to render, there is a good chance that search engines will only index your main content. This makes a lot of sense from the search engine’s point of view. Why spend so many resources just to index boilerplate content or content that is not relevant to the user’s intent?

The list above is by no means definitive. However, diagnosing and fixing partial indexing problems becomes easier over time. As with most technical SEO tasks, our goal is to make the search engine’s job easy. This one is no exception. 

Breaking the vicious cycle

Partial indexing may cause a vicious cycle of crawling and indexing issues

Most of the time, partial indexing is a symptom of bigger-picture technical SEO issues. When we first started looking into partial indexing a few years back, we didn’t take this issue seriously enough. 

Partial indexing means that you lose both control and understanding of your website’s graph. I have yet to see a website where partial indexing isn’t connected to other technical SEO issues. The example above is the most popular judging from our data, but the options are unlimited. 

Getting started with Rendering SEO

Rendering SEO is focusing on three key elements. 

  1. The layout of your page.
  2. Relationship of your layout with parts of your code
  3. How search engines understand and process both the layout and your code.

JavaScript SEO was mostly focused on “if Google can see our content”. We can see now, that this point of view was extremely limited. For two reasons: 

  1. Google will often skip indexing parts of the content even for pages that are 100% HTML/CSS based.
  2. Rendering and indexing (and problems related to both) are a challenge with and without JavaScript dependencies. 
JavaScript SEO is only a small part of the solution to technical problems which Rendering SEO covers

When we started to analyze our dataset of partially indexed pages with LAYOUT in mind, we finally understood how important it is. 

Just to go through one example, content prominence seems to be a very strong metric, and one we didn’t pay enough attention to. Google will go through heavy JS files just to make sure that your main, most prominent content is indexed. 

At the same time, search engines will happily skip indexing of lightweight HTML/CSS content if it is not relevant to the most prominent part of the layout. Understanding how Google is assessing different parts of your layout is one of the most important things when it comes to getting your website ranking well in Google. 

Wrapping up

Getting your content rendered and indexed by Google is not a given. If anything, we can assume that getting your content into Google’s index will become harder and harder. 

With billions of websites flooding the web while shipping code that is increasingly expensive to parse and render, search engines will have to pick content that is worthy of its price in the resources spent on indexing it. 

I truly hope that I’ve managed to get you interested in the topic of Rendering SEO. While it’s one of the biggest challenges we are facing as technical SEOs, it comes with tremendous potential to make your website grow.

Rendering SEO allows us to focus on real solutions rather than superstitions
Book15minCallIcon

Give us 15 minutes of your time and find out why big brands trust Onely with their major technical SEO issues.