Paywalled content and cloaking
00:49 “In regards to paywalled data with paywall content. […] We have a website. We did a lot of articles, and everything is accessible to Google. And we would like to add a paywall there, but […] only […] show the paywalled content to Google with the structured data snippets you have. Is it considered cloaking?
So, I check whether it’s Googlebot, and only [then] show […] the structured data – […] the paywalled data. But then to the regular user […], I don’t show the structured data, is that fine?”
John did not see the problem with this solution: “That’s fine. It, technically, would still be considered cloaking, because you’re showing something different, but from our policies, that’s acceptable. Because users would, […] if they go through the paywall, […] see the content that you’re showing Googlebot.”
Potential indexing issues
03:38 “I publish high-quality content, I submitted a sitemap, and sometimes request indexing from Google Search Console. But I still have a problem indexing new content, or it’s indexed [with a delay]. […] It’s a bug from Google, or it’s a new algorithm update?”
John replied: “There is no bug on our side in that regard. […] We just don’t index all the content, and some websites generate a lot of content. And if we don’t index everything […], that can be OK. But maybe you want everything indexed, and we can’t do everything all the time.
The tricky part […] is that, in the past, […] a lot of websites were technically not that great. It was a little bit clearer which kind of content did not get indexed. Nowadays, websites are technically OK, and it’s […] like the quality bar is a little bit higher […]. Anyone can publish something that, theoretically, could get indexed, but […] we have to make sure that we’re indexing the right things that are actually useful and relevant for users. So we sometimes have to leave some things unindexed.”
Product reviews update – affected languages and countries
14:01 “About the product reviews update. […] Even if the update only affects English-speaking websites, I was seeing some movements in German Search as well. I was wondering if there could also be an effect on websites in other languages by this product reviews update or any kind […]?”
As John said, “My assumption was this was global and across all languages […]. But usually, we try to push the engineering team to make a decision on that, so that we can document it properly in the blog post. I don’t know if that happened with the product reviews update. […] It seems like something that we could be doing in multiple languages and wouldn’t be tied to just English. And even if it were English initially, it feels like something that is relevant across the board, and we should try to find ways to roll that out to other languages over time as well. So I’m not particularly surprised that you see the changes in Germany […].”
After learning that the Google blog post only mentioned the update affecting English language websites, John elaborated further:
“With this kind of updates, we try to get started with one language or one location and see what we need to tweak, and then we expand from there. […] With something that is more content-related, usually it takes a bit longer to expand to different languages […].”
Localizing pages for English-speaking countries
17:53 “Do you know any other ways to localize the same set of pages for different English-speaking countries? […] We have several subdomains with .jo top-level domain, like maybe from Australia, New Zealand subdomains, and we have set the country in the JSA backend and also use hreflang on page-level. […] We couldn’t figure out some other ways to help us to localize these subdomains. Do you have any good methods or some ways that we can improve?”
Here is how John discussed this topic:
“I think you covered the main ones. That’s geotargeting in Search Console and the hreflang settings.
Geotargeting works on a subdirectory or a subdomain level, it’s all pages in there.
Hreflang is on a per-page basis. If you have a home page for one country and different product pages for the same country, then each of those pages would need to be cross-linked with hreflang.
If these pages are essentially the same, it can happen that we treat one of these pages as the canonical version. For example, if you have a page for New Zealand and Australia, and the whole content is the same, the only thing that’s slightly different is the currency on the page, then […] we fold those pages together and pick one of them as a canonical, and use that as the basis for Search.
If you have a hreflang, on those pages too, we will still use the hreflang to show the right version of the URL. But the indexed content will be just from the canonical version, and all of the reporting in Search Console will be for the canonical version. That makes it sometimes a bit tricky, especially if you have a larger website with […] the same content for different countries.”
Find out more about making your website international with our SEO international guide.
Adding dynamic content to pages
25:0 “My website has millions of pages, like category, subcategory, and product, e-commerce […] pages. We have added dynamic content, because [with] millions of pages […] [it’s] difficult to add separate content or […] unique content on each page. We have added […] template-based content on category pages, subcategory page, and product pages. […] That would be good for our website performance or not, or should we update the content for each page? […]”.
Here is how John responded:
“Dynamically adding relevant content to a page […] can make sense because […] [it] is essentially just doing […] a database lookup and adding content based on that. […] It really depends on how you have that set up.
The main thing I would avoid is that you run into a situation where you’re artificially adding content to a page just in the hope that this page ranks better for the keywords that you artificially add. […] When users go there, they’ll be like ‘Why are these random keywords on this page?’ […] Making sure that you actually have good, relevant content for those key keywords, that’s more what I would focus on […].”
When additionally asked whether it was necessary to write relevant content for each page for Google to see pages as providing value, John said:
“It should be something on the page that is relevant. And if it’s a category page, then the products that you have listed there are very relevant […] and usually, you have a description of that category. […] It’s not that you have to write a Wikipedia article on the bottom about all of these products and where they come from […] but a little bit of information that is relevant to the page, that does matter.”
Indexing URLs generated through search within a website
30:11 “We have already added a search box in our website, so the user come on our website and search over there, and it generates a unique URL for every search. These URLs should be indexable or not?”
As John said, “Usually not. […] There are two main reasons for that.
On the one hand, it’s very easy to end up in a situation where you have another million URLs that are just different searches, which doesn’t provide any value to you. We call it an infinite space […]. That’s something you want to avoid.
The other thing you want to avoid is that people do spammy things in the search box and try to get those things indexed, which could be something like searching for their phone number, and […] their business type […]. Suddenly, your website’s search page ranks for that kind of business and shows their phone number, even if you don’t have any content that matches those queries, […] they do this to try to be visible in the search results. I would block this kind of search pages with robots.txt. That way you can be sure that we won’t be able to index any of the content.”
SEO sites as YMYL
According to John, “[…] I don’t think SEO websites are that critical to people’s lives. Obviously, if you work at an SEO company, then you’re tied to that, but it’s not that the website itself is a Your Money or Your Life type website. […] Not every website that sells something is falling into this category.
What I would recommend here is, rather than blindly trying to see ‘Is this type of website falling into this specific category?’, […] read up on where this category came from, namely the Quality Rater Guidelines, and understand a bit more what Google is trying to do with understanding these different types of websites. […] That will give you a little bit more background information on what is actually happening […].”
39:56 “When it comes to breadcrumb structured data, does it have to be exactly the same as the breadcrumbs that a visitor would see on a page? I sometimes see a condensed version of breadcrumbs on the page, while the structured data is a complete breadcrumb path. Are both acceptable options?”
As John said, “[…] We try to recognize if the structured data is visible on a page or not. And if it’s not […], we have to figure out “Does it still make sense to show this in the search results?”
If you’re doing something like showing a shorter version of a breadcrumb on a page, and we can’t match that, it might be a bit hit and miss, if we actually pick up that breadcrumb markup and use that.
If you’re taking individual crumbs or […] the individual items in the breadcrumb list, and you’re just showing some of those but not all of them, it might be that we just pick up those. It might be that we still pick up the rest because we see […] a lot of the breadcrumb matches.
It’s not guaranteed that we will be able to pick up and use the full breadcrumb markup that you have if you’re not showing that on the page, and that’s similar to other kinds of structured data.
I think the main exception […] is […] the FAQ markup, where you have questions and answers, where […] the important part is that the question is actually visible, and the answer can be something like a collapsed section on a page, but […] at least has to be visible.”
Translating only some pages on a website
44:00 “We run a site with under 300 index pages all in English. We’re looking to translate about half of these pages in Spanish which will be placed in the subdirectory on the same domain, like /ES, and tagged as alternate language versions of the English content. Is it OK to translate only some of the page’s content, or should we translate everything to exactly mirror the English website and stand the best chance of ranking in other locations?”
John said: “It’s fine to just translate some pages on a website. We look at the language of pages individually. If you have some pages in Spanish, we just look at those Spanish pages, when someone is searching in Spanish. It’s not the case that we would say: ‘There are a lot more English pages than Spanish pages here. Therefore, the Spanish site is less important.’ […] These are Spanish pages, and they can rank well in Spanish. […] For users, sometimes, it makes sense to have as much content as possible translated. But usually, this is something that you incrementally improve over time, where you start with some pages, you localize them well, and add more pages […].
The hreflang annotations are also on a per-page basis. If you have some pages in English and in Spanish, and you link those, that’s perfectly fine. If you have some pages just in Spanish, that’s fine – you don’t need hreflang. Some pages just in English, that’s also fine. From that point of view, this seems like a reasonable way to start.”
Crawl budget and automatically generated URLs
46:12 “The website I’m talking about is a WordPress website. It automatically generates multiple unwanted URLs. […] is there a way where I can stop the crawler to find out these URLs? I know I can ‘noindex’ it, and those are all no indexed URLs. But then, I can see them on the Search Console under the Excluded part. […] It’s a news website, we have thousands of URLs. […] Is it going to affect the crawling budget?”
John inquired about the size of the website and was told that it was between 5,000 to 10,000 URLs.
Given that, John said: “I would not worry about the crawling budget. […] We can crawl that many pages fairly quickly, usually within days. The other thing […] is the ‘noindex’ is a meta tag on the page. We have to crawl the page to see the meta tag, which means you can’t avoid that we check the ‘noindex’ pages. […] If we see that there’s a ‘noindex’ on the page, then usually over time, we crawl those pages less often. We will still double-check every now and then, but we won’t check as much as a normal page that is otherwise indexed. The other approach is to use robots.txt. With the robots.txt file, you can block the crawling of those pages completely. The disadvantage is that sometimes the URL itself can be indexed in the search results, not the content on the page […].”
John also gave the following example:
“If you […] have a football news website, and you have some articles that are blocked and some articles that are allowed for crawling, then if someone is searching for football news, they will find the indexable versions of your pages, and it won’t matter that there are other pages that are blocked by robots.txt. However, if someone explicitly does a site query for those blocked pages, then you would be able to see those URLs in search […]. In a situation like yours, […] I would not worry about the crawl budget.”
John also added: “From a practical point of view, both the ‘noindex’ and the robots.txt would be kind of equivalent. […] This content would probably not appear in the search results, and we would still need to crawl it if there’s ‘noindex’, but the numbers are so small that they don’t really matter. We might still index it with a URL if they’re blocked by robots.txt […]”.
Concerning the preferred method, John said: “I would choose the one that is easier to implement on your side. If […] you have WordPress and you can just have a checkbox on the post that says ‘This page’s noindex’, maybe that’s the easiest approach […].”
Want to optimize your crawl budget?
Contact us for crawl budget optimization services.
Crawling URLs with parameters
54:25 “We see in our log files, and also proving that it’s Googlebot via IEP, a lot of crawling from the organic bot to UTM parameter URLs, Google Display, and universal app campaigns. […] We don’t see any links coming from anywhere to those URLs. […] Do you have any idea of where or why this might be happening?”
John responded that “The one place where with Googlebot we also crawl pages that you list in ads campaigns […] is for product search. If you have a product search feed or Merchant Center feed […] set up, then we would also crawl those pages for Googlebot to make sure that we can pick them up for the Merchant Center. If you have tagged URLs in there, […] we will keep those tagged URLs and reprocess them.
It might also be that other people are able to submit this kind of products, […] it might necessarily not be you who’s submitting them, but maybe someone who’s working on your behalf or has the permission to do that as well.