Anyway, if we look at this very page it has 500 referring domains, 2,000 backlinks, like a tremendous amount of reads, claps and whatever. So it’s a very popular piece of content that we would assume that Google is gonna crawl and render and index pretty often.
Long story short we found that thousands of domains are not fully indexed and even after months from publishing the content, so even if you publish your article – amazing article today, it may happen that it’s not going to be like, the URL is gonna be indexed but the content is not gonna be indexed for a few months. Or someone else is gonna overrank you for your own content.
Let’s talk about what Googlers told me in Zurich which was actually very, very interesting.
So I asked them how rendering works with Google and I was saying, “ Okay, you’re looking at the difference between the initial HTML. And they look, okay, if they render the content, they see if there’s any change.” So Google is going to look at the HTML version and they’re going to compare that with the rendered version.
Now so if we now think about Medium, let’s say that you’re going to publish an amazing article on Medium, Google is gonna render, like compare the version of this article with the rendered article, there’s gonna be no difference because there are no comments. So this is where we actually start to see a massive issue with the heuristics.
And I can just imagine, Martin said that he hasn’t fully grasped what triggers the heuristics, it’s not because Martin is not good with that – he’s an amazing guy. It’s just basically, I’m guessing that those heuristics are somehow relying on machine learning and things that are just not human readable. So they would see, okay there’s certain heuristics that if they see after a while – they look at the difference between the rendered page and not rendered page.
Again, the Medium example. But I would say that those heuristics are still in the infant stage. They’re still pretty new, they’re still playing and optimizing them, like the Google algorithm in 2006. You probably remember how easy those times were.
And those heuristics are far from perfect. And what Martin actually said is that all new websites get rendered and this is extremely interesting because from my point of view, okay, what’s a new website?
And the second problem I had, okay, all of our experiments that we did at Onely were based on new domains, new IPs and so on so, most of our experiments were kind of useless from this point of view.
What’s a new website? So if you’re gonna – if you’re going to relaunch your CMS, so if you’re gonna publish a new [version of your website], is it going to be a new website or it does have to be a new domain? Or what if a new website doesn’t have some kind of content that’s user-generated?
We started playing with that and we’re like, okay, with a lot of clients you would actually advise to do an experiment on staging before publishing a new CMS. This was dumb looking at looking at how it’s structured, like you can’t really test a new CMS because you’re most likely gonna use a new domain. And you can’t really index and play with that within your actual domain.
Actually, this page is still not indexed after two years. So we repeated that experiment, just with a lot of different domains like jscrawling.party or HTMLcrawling.wine. We went all crazy on the new TLDs. And like jscrawling.pizza is one of my favorites.
After four hours, all 29 out of 30 pages were indexed and after eight hours all the test domains were indexed completely. So this actually turned out to be a massive win for Google as well.
We couldn’t somehow force Google to fail with indexing. Something that wasn’t possible two years ago at all. So as I said, this is quite a change that’s not somehow visible in the industry.
We don’t give up easily. We figured we’re gonna create one more experiment. We’re gonna relaunch this experiment from 2017 that was massively popular. We did. Long story short, I’m guessing you know where it goes – again Google didn’t choke on any of the scripts, any of the frameworks, any of the setups, inline, external, doesn’t matter.
Again Google won. So Martin Splitt was completely right about all the new websites. This is something that they did – they designed well and it actually works as they designed it for, again, new websites.
But what about popular websites?
This was one of our most complex experiments that kind of dragged for a few weeks. It got a little bit out of control because we spent way too much time on it after seeing some of the changes.
But not every website is lucky enough. This is enough of the positive examples here. We’re gonna go into the most interesting zone of things that don’t work, which is usually what SEOs love the most.
The Guardian has 66% of content not indexed after two weeks. And you would assume that a newspaper – and this is not like a tiny bit of page – this is like a massive – I think it’s you might also be interested in or like all everything that they do for internal linking, or maybe not everything like – good bunch of links they used for internal linking, for new content is not indexed.
Secondly, I wonder how Google is going to deal with pages so they would have to render everything, like Medium, like The Guardian, so I wonder how granular is it going to be and at which point they’re gonna – we’re rendering every single page online. So what to do?
We had some big news a few days ago and I want to explain that a little bit.
And you can see also like after one week or after two weeks if anything is going to change.
It’s quite a lot of work to do that manually, but I think that the database now is going to around 100 to 200 websites – big websites – but for each page, we take quite a lot of URLs. It’s constantly growing. So last time, I talked to our research development team and they were adding like tens of pages per day, so it’s growing. And we manually footprint that. There is no way to automate that, actually.
This is BBC, so you would expect that to go a little bit better. This may be somehow influenced by how we test that, because we still can’t believe that they would do it like that, but even if it is this is something to look into. But these problems are definitely for BBC to fix.
Which one matters in the end?
And Too Long; Didn’t Render, so TL;DR, is the last part of our amazing toolset, where you can actually see the cost of rendering your page. You can see it’s based on CPU and memory. And there is one winner in our case, I had to run it a few times to just to go to the green zone. But there is one page– okay, BBC kind of got crazy with the how they are, so why do you need that?
I should lead with that. You need that because if, in the case of the BBC, if your users have cheaper mobile devices that – this topic I spoke about quite a lot of times already- if your users have cheaper mobile devices, BBC is gonna choke on them.
Motorola’s G4, cheaper Android devices, older iPhones are not gonna deal well with such a load of rendering.
There is one page at one, this page – this page is still our number one. This is SEOktoberfest – this page made by Marcus [Tandler].
There is – there’s zero CSS and the score is – the score is 2. So this is the most amazing score we had because the cost of rendering is almost zero and it’s only because of one image. I guess if you would remove that image it would go down to 1, so maybe you should go backward in development.
But you can see that some other – like what we see quite often if your content relies on Google it is very easy to overrank you with content that’s rendered. So, yeah, that’s more or less it.
You can see that this problem also comes up in mobile-friendly tester somehow. And there are quite a lot of tools we’re actually building to launch, but you can see that this is quite useful to play with your domains. It’s completely free.
Let’s talk about HTML. Let’s go old-school for a second.
Let’s see how quickly Google is gonna index HTML content from The Guardian. Looking at 1300 URLs, Google indexed 98% of the HTML content from The Guardian. Pretty decent – I wouldn’t complain, but it doesn’t look as good for other brands. So the Guardian is fairly good, like most of the HTML – this is just HTML. So just if the URLs indexed, The Guardian is very good.
Because if we look at, for example, okay, Eventbright, Eventbright has 55 or 56 percent of their content indexed after two weeks. So you can imagine how much of a problem that is. Because, because they will optimize those pages. They’ll ask, “Okay, why don’t we rank well?” When actually half of their pages are not indexed, half of their domain – so, so, so HTML seems to be very problematic . . .
Just one last slide – this is an example that I’m using quite a few times in a lot of conferences to actually show that this is dangerous. This is something to either, depending on the side of SEO you are either playing with or avoid.
We created quite a lot of pages with content that somehow sensible to a lot of people like gun control, Trump versus Hillary, or Peppa Pig – if you saw the Peppa Pig drama with some of the violent content.
A lot of people are playing with that to somehow inject quite a lot of content for search engines, but not for users. So they would say, okay, you would see – what we actually saw is – you see just a listing page with just a photo of let’s say a car, and in the description, but Google is seeing like a massive spreadsheet of data and everything, and this still works very well.
This is something that Google can’t fix for, I’m guessing, technological reasons. Or maybe they don’t – maybe the scale is not big enough for them to worry about it. So this is something I actually showed it to Martin Splitt as well. So, so maybe they will somehow address that but this is a massive issue. So again depending on the side of SEO you’re on, you’ll avoid that or play with it.
Thank you so much. This is it. More data and more tools are coming soon.