SEO Office Hours, February 18th, 2022

seo office hours banner

This is a summary of the most interesting questions and answers from the Google SEO Office Hours with John Mueller on February 18th, 2022.

Types of websites affected by the Product Reviews update

4:03 “[…] My question is about the product reviews update […]. I wanted to understand how Google identifies whether a page or site is related to product reviews. […] For example, there’s an eCommerce site […] and they also have a blog where they review their own products. They do write about pros and cons of their products, compare different products. […] Will Google say that […] this is also product reviews and can be analyzed by product reviews update? […]”

As John explained, “[…] The recommendations we have for product reviews […] would be relevant for any kind of product review. So I wouldn’t necessarily try to see, does Google think my site is a product review site or not […]. But rather, if you think these good practices would apply to your content, then just do those good practices […]”.

The use of the Indexing API

6:53 “[…] [Google’s documentation] mentions that the Indexing API should be used for pages like job posting or broadcasting events. Is it possible that we can try this API for different type[s] of content, like some news articles or blog content?”

John responded: “People try it. But essentially, what we have documented is what we use the API for. If you don’t have content that falls into those categories, then the API isn’t going to help you there”.

E-A-T and Google’s algorithms 

10:54 “[…] E-A-T is mentioned in [Quality Rater Guidelines], but I want to know if real algorithms also [include] E-A-T factors like author’s expertise?”

John said: “I would assume that there is some indirect work done to try to do similar things. […] We put this in the guidelines so that we can guide the quality testers to double-check these things. And if we think that it’s something important, then I would assume that folks on the search quality side also work to try to understand that in a more algorithmic way. 

But I wouldn’t see […] [that there would be] an E-A-T score, and you have to get five or something like that on it. It’s more trying to understand the context of the content on the web”.

Unlinked brand mentions and user-generated content

12:01 “[…] I see people are speaking about unlinked brand mention[s] […]. Do you think it’s also important for [Google’s] algorithms […]?”

By unlinked brand mentions, the person was referring to situations where other sites mention your brand but don’t include a link to your website. 

John said: “[…] I think that’s kind of tricky, because we don’t really know what the context is. I don’t think it’s a bad thing […] for users because if they can find your website through that mention, then that’s always a good thing. But I wouldn’t assume that there’s some […] SEO factor that is trying to figure out where someone is mentioning your website name”.

12:58 “[…] What about user reviews or comments? Do you think it’s also a ranking factor for an article or product?”

John responded that “[…] Oftentimes, people will write about the page in their own words and that gives us a little bit more information on how we can show this page in the search results. From that point of view, I think comments are a good thing on a page. Obviously, finding a way to maintain them in a reasonable way is sometimes tricky because people also spam those comments […]. If you can find a way to maintain comments on a web page, that gives you a little bit more context, and helps people who are searching in different ways to also find your content”.

Googlebot and infinite scrolling

24:00 “[…] Do you know if Googlebot is advanced enough to handle infinite scrolling yet, or at least something where content keeps building onto something?”

John said: “A little bit […]. 

What happens when we render a page is, we use a fairly high viewport, like if you have a really long screen, and we render the page to see what the page would show there. Usually, that would trigger some amount of infinite scrolling in whatever JavaScript methods you’re using to trigger the infinite scrolling. Whatever ends up being loaded there, that would be what we would be able to index. 

[…] Depending on how you implement infinite scroll, it can happen that we have this longer page in the index. It might not be that we have everything that would fit into that page. Because depending on how you trigger infinite scroll, it might be that you’re just loading the next page. Then we might have two or three of these pages loaded on one page with infinite scroll, but not everything. […] I would recommend testing that with the [URL] Inspection tool and just seeing how much Google would pick up”.

Refresh and discovery data in the Crawl Stats report

33:32 “In the Search Console [Crawl Stats] report, 97% of the crawler requests are refresh, and only 3% is discovery. How to optimize this and let Google discover more pages?”

John responded: “[…] It’s normal for […] an older, more established website to have a lot of refresh crawl because we will look at the amount of pages that we know about that grows over time. And the amount of new pages that comes in tends to be fairly stable. It’s pretty common, especially for a website that is kind of established and just slowly growing, to have a balance like this, that most of the crawling is on the refresh crawling and not so much on the discovery crawling. 

I think it would be different if you had a website […] where you have a lot of new articles that come in, and the old content becomes irrelevant very quickly. Then I think we would tend to focus more on discovery. […] If you have something like an eCommerce site, where you’re just growing the amount of content that you have slowly, and most of the old content remains valid, […] the amount of refresh crawling [is] probably going to be a bit higher”.

Reduced crawling of a website

35:09 “During the last few weeks, I’ve noticed a huge drop in crawl stats, from 700 to 50 per day. Is there a way to understand from the Search Console report what could be the cause of this drop? Could it be source page load? How can I correctly read the crawl request breakdown?”

John provided a detailed explanation of how Google crawls websites and what factors affect crawling: “[…] There are a few things that go into the amount of crawling that we do. 

[…] We try to figure out how much we need to crawl from a website to keep things fresh and useful in our search results. And that relies on understanding the quality of your website, how things change on your website. We call that the crawl demand.

On the other hand, there [are] the limitations that we see from your server, […] website, […] network infrastructure with regards to how much we can crawl on a website. We try to balance those two.

And the restrictions tend to be tied to two main things: […] the overall response time to requests

to the website, and […] the number of […] server errors that we see during crawling. If we see a lot of server errors, then we will slow down crawling […]. If we see that your server is getting slower, then we will also slow down crawling […].

The difficulty with the speed aspect is that we have two […] different ways of looking at speed. Sometimes that gets confusing when you look at the crawl rate. Specifically for the crawl rate, we just look at, how quickly can we request a URL from your server? 

And the other aspect of speed that you probably run into is everything around Core Web Vitals and how quickly a page loads in a browser. The speed that it takes in a browser tends not to be related directly to the speed that it takes for us to fetch an individual URL on a website. Because in a browser, you have to process the JavaScript, pull in all of these external files, render the content, recalculate the positions of all of the elements on the page. And that takes a different amount of time than just fetching that URL.

[…] If you’re trying to diagnose a change in crawl rate, then don’t look at how long it takes for a page to render. […] Look at purely how long it takes to fetch that URL from the server. 

The other thing […] is that […] we try to understand where the website is hosted […]. If we recognize that a website is changing hosting from one server to a different server – that could be to a different hosting provider, […] moving to a CDN, or changing CDNs […] – then our systems will automatically go back to some safe rate where we know that we’re not going to cause any problems and then, step by step, increase again.

Anytime you make a bigger change on your website’s hosting, I would assume that the crawl rate will drop. And then over the next couple of weeks, it’ll go back up to whatever we think we can safely crawl on our website. That might be something that you’re seeing here.

The other thing is that, from time to time, our algorithms to determine how we classify websites and servers […] can update as well. […] Even if you don’t change anything with your hosting infrastructure, our algorithms will try to figure out [that] this website is hosted on this server, and this server is one that is frequently overloaded. We should be more cautious with crawling this website so that we don’t cause any problems. That’s something that also settles down automatically over time, usually over a couple of weeks […].

[…] In [Google] Search Console, you can specify a crawl rate […] and that helps us to understand that you have specific settings […] for your website and we’ll try to take that into account. The difficulty with the crawl rate setting is that it’s a maximum setting. It’s not a sign that we should crawl as much as that, but rather that we should crawl at most what you specify there. Usually, that setting is more useful for times when you need to reduce the amount of crawling, not when you want to increase the amount of crawling.

[…] One thing that you can also do is, in the Help Center for Search Console, we have a link to reporting problems with Googlebot. If you notice that the crawling of your website is way out of range for what you would expect it to be, then you can report problems with Googlebot through that link […]”.

How Google identifies countries targeted by pages

56:25 “[…] As for geotargeting, besides using hreflang, how does Google figure out what [country] you’re targeting [with] this specific website or the specific subdirectory?”

John’s response was: “We try to group URLs by clear patterns that we can recognize […], for example, by subdomain or by subdirectory. If you have the country in the subdirectory in a higher place in a path, then it’s a lot easier for us to say, everything under this path is for this country, everything under this other path is for another country. 

You can also verify individual paths in Search Console […], which makes it a little bit easier for us. In practice, I don’t hear a lot of feedback from people saying that this makes a big difference. 

[…] I would try to make it […] as clear as possible which country is relevant for the individual URLs, with a clear path in the URL. I think there was a question someone submitted as well about using the country as a URL parameter at the end. Theoretically, you can do that […]. For our systems, it makes it a lot harder to recognize which URLs belong to which country […]. If you’re using hreflang, then that’s less of an issue there, because you can do that on a per-URL basis”.

Large numbers of URLs marked as Discovered – currently not indexed

58:25 “[…] We are a huge eCommerce site and as we checked our crawl report, we found that there are huge amounts of URLs in the [Discovered – currently not indexed section] […]. Is this an indication of [a] problem [on our site] […]?”

John said: “I think it depends on what those pages are and how you use them within your website. […] We find all kinds of URLs across the web and a lot of those URLs don’t need to be crawled and indexed, because maybe they’re just variations of URLs we already know, or […] some random forum or scraper script has copied URLs from your website and included them in a broken way. […] It’s very normal to have a lot of these URLs that are either crawled and not indexed or discovered and not crawled, just because there are so many different sources of URLs across the web. 

[…] Try to download […] a sample of those, so that you can look at individual examples, and […] classify which of those URLs are ones that you care about and which […] are ones that you can ignore.

[…] The ones that you do care about, that’s something where I would try to figure out what you could do to better tie these in in your website with regards to things like internal linking. So if these are individual products or categories that are not being found, try to figure out what you can do in a systematic way to make sure that all of these URLs are better linked between each other. […] Especially with a larger eCommerce site, it can get tricky, because you can’t look at every URL individually all the time.

But sometimes, there are tricks that you can do where you say: anything that is first level category, I link to it from my home page. And I make sure that my first level category has at most […] maybe 100 items or 200 items, so that you have a little bit of a forcing function in terms of what you give Google to crawl and index. Based on that, you can build it out a little bit more systematically. 

[…] To some extent, I would just accept that Google can’t crawl and index everything. […] If you recognize, for example, that […] individual products are not being crawled and indexed, make sure that at least the category page for those products is crawled and indexed. Because that way, people can still find some content for those individual products on your website […].

See if you can crawl your website yourself so that you have a little bit more direct data of how a website like yours can be crawled. There are various crawling tools out there. […] By crawling the website yourself, you can see which of these URLs are linked very far away from the home page, and which of these are linked closer to your home page. And based on that, sometimes you can tweak the site’s structure a bit to make sure that things are reasonably close or reasonably stable, with regards to the distance from your home page”.