The size of the robots.txt file
00:45 “Are there any negative SEO effects that can result from a huge robots.txt?”
John answered that there are “no direct negative SEO issues with that. But it makes it a lot harder to maintain. And it makes it a lot easier to accidentally push something that does cause issues. Just because it’s a large file doesn’t mean it’s a problem, but it makes it easier for you to create problems.” […]
04:35 “Apart from radically shortening [the robots.txt file], is there any guideline for building [it]?”
John: “No, it’s essentially up to you. Some sites have big files. Some sites have small files. They should all just work. We have an open-source code of the robots.txt parser that we use. So what you can also do is get your developers to run that parser for you or set it up so that you can test it. And then, check the URLs on your website with that parser to see which URLs would get blocked and what that would change. And that way, you can test things before you make them live.”
You can also find more information on the robots.txt file in our Ultimate Guide to Robots.txt for SEO.
Migrating a product category to a new domain
08:56 “We are planning to move our one product category of multi-vendor on a new domain or subdomain. […] How to rank the new domain? The current domain ranks well on Google and other search engines and provides us [with] good, organic search traffic. How much time it will take to make the new domain receive the amount of traffic we are receiving now?”
John replied, “I don’t think there is any fixed time for that change because it sounds you’re not moving from one domain to another. You’re moving from one infrastructure to a different infrastructure. And often, that means the content will be different. The structure of the pages, maybe even the URLs, will be different. All of that can change. And all of these things are elements, which take time to be processed. It depends on the website with regards to how long that would take. And it’s not that you can have a specific timeline for that.
I think the other part to also keep in mind is these changes can have overall positive or negative effects for a website. So it’s possible to take this kind of migration and say, we will also work on SEO and improve the interlinking of our pages, the URL structure, and the HTML format of our pages. All of these can have very positive effects on your website.
But at the same time, if you don’t watch out for these things and suddenly you have a big mess of URLs, and the HTML is not easily understandable by search engines, then that can have a negative effect. So it’s something where you shouldn’t assume that if you’re migrating an eCommerce shop from one platform to another, it will be the same on the other platform after a certain period of time. It can be similar [or] much better [but] it can also be much worse. So you need to watch out for all of those details and think about what is the final structure that you want, and what SEO elements do you want to include with that migration.”
11:45 “What negative aspects will we face when we’re moving that infrastructure to a new domain?”
According to John, “[…] Usually, what happens in a situation when everything is very well lined up, you will see some fluctuations over the time when we learn about the new website to when we shift everything over. And that’s something where I would assume you will see some less visibility in Search. But it depends on all of the changes that you make along there, where it can take a lot longer. It can also be something where the final result is much worse or even much better than it was before.”
20:38 “Do you know what the risks of redirecting users from a site page tool like an app are? Does it have a negative impact on traffic from an SEO perspective? […] Our app has a higher conversion rate [than our mobile version], so we are thinking that maybe we can redirect some users when they get on some product pages or category pages […] to the app or app store. Maybe [it can] contribute to a higher conversion?”
John’s response was: “I think overall you can do that. The aspect I would generally watch out for there is that you do it in a way that lets users go to the app if they want to. I don’t know all of the details at the moment with regards to the connection between the apps and the web pages, but I believe there is a way to do a smart banner, where if you can recognize that the user has the app installed, it’s very easy for them to move to the app experience from there. But I don’t know the specific details for Android and iPhone. […]
In general, from a Search point of view, if we can index the individual mobile pages, the desktop pages, as well, or whatever you have available there, that’s perfectly fine. And if people from your pages end up going to the app, that’s also perfectly fine from our point of view.”
22:57 “You are talking about a top banner on the site page. Maybe if we’re [forcing] them to redirect, will that be bad for SEO or the site?”
John stated, “I think probably that would be okay. There are two things I have in the back of my head, which might be something to watch out for.
Since Googlebot also uses an Android user agent, you need to make sure that you don’t redirect Googlebot to the app store or the app because we won’t install the app. So that’s one thing. The other thing is with regards to, specifically, the metrics around Core Web Vitals. If you always redirect mobile users directly through the app, then you won’t have a lot of data for the Core Web Vitals. And depending on your site, […] it’s also something to keep in mind there. But I think, […] there is nothing negative from an SEO point of view if you redirect users to an app. From a usability point of view, making it optional is a lot nicer. But ultimately, that’s between you and your users.”
Can Google assess the similarity of pages?
26:28 “How does Google measure the similarity of pages?”
John said, “I think we don’t. I think we use the hreflang to understand which of these URLs are equivalent from your point of view. And we will swap those out. […]
We would only do that for things like the rel=”canonical” to understand what the canonical URL is. But for hreflang, I think it’s impossible for us to understand that this specific content is equivalent for another country or another language. There are so many local differences that are always possible.”
27:22 “We are a big eCommerce site, and there are millions of backlinks. We have a standard procedure to check some spam backlinks every month or several months. We just noticed that the upper limit of the Google Disavow list is only 2 MB. I wonder if our file has exceeded the limit, then how to deal with those spam backlinks. […] Currently, most spam links we found [are] targeted to our site into our search pages, which is super weird for me.”
John replied, “Usually, I would recommend, on the one hand, trying to use the domain directive as much as possible ‒ that saves you multiple entries from the same site and also not to focus too much on trying to clean up all the links because that’s always impossible. I would focus on using the Disavow for links, where you look at them, you think, if someone from the website team were to look at this, they would be 100% certain that you bought them or that there was some exchange happening here. But for all of these kinds of random links that a website gets, and even from spammy or copy pages or random forum posts, those are not things that you need to put in the Disavow file. […]
I don’t know if this is the case in your situation, but I’ve seen that before that [these links] target search results pages with a specific query that includes things like a phone number or a URL in the hopes that that phone number shows up in the search results. And if you noindex your search results pages or search results pages that have maybe a longer query in them, then they automatically don’t get indexed.”
Traffic drop vs. removing AMP pages
30:43 “Should we expect a drop in traffic if we remove AMP?”
John: “I assume this is a setup, where you have traditional HTML pages and AMP pages, and you link between them. I think there are three things that come together when you remove AMP pages like this.
On the one hand, there are some search features that are limited to AMP-only pages. […] I would have to double-check, but I don’t think there are any search features at the moment that are only available to AMP pages. So from that point of view, you wouldn’t be losing anything there.
The other thing is that AMP pages tend to be very fast, or it’s easier to make very fast AMP pages. And since we do use Speed and Page Experience as a ranking factor, it is something where if you have a lot of very fast pages in AMP and you switch over to slower pages that are non-AMP, then you might see an effect there. You can, of course, make very fast pages that are not AMP, as well. It’s not limited to AMP. So that’s something where I would double-check to see how things with regards to Speed apply there.
And I think the third one is […] this assumption that AMP pages somehow rank better. And that’s not the case. AMP is not a ranking factor. So it shouldn’t be something where you would see a change in ranking just because you have AMP pages or don’t have AMP pages. […]
If you can make sure that your normal pages are fast and equivalent and you have all of the structured data that you need in those normal pages, then probably you can turn off AMP. And it’ll be essentially very similar. What you’ll probably see is a transitional period of some AMP pages still being in the AMP cache and taking a while to bubble out. But in general, it’s possible to turn these off. We have a Help Center article about turning off AMP pages, so I would double-check that as well.”
35:17 “In recent months, I noticed Google surveying Knowledge Panels for certain name searches very consistently in mobile and not at all on desktop for the same query. […] Is it possible to understand why a Knowledge Panel is considered appropriate to serve mobile users but not desktop users in this situation? And is Wikipedia the critical factor for Google when deciding whether or not to show a Knowledge Panel?”
John said, “I’m not aware of specific things that we do differently on mobile and desktop with regards to a Knowledge Panel. But it is very common across the different search features that, depending on the device type on the real estate that we have available, we’ll turn some features on and some features off to try to make sure that we’re showing something that is useful to the user based on the query that they were using. From that point of view, I wouldn’t be surprised if you see different Knowledge Panels on desktop and mobile. But I also don’t think there [isn’t] any particular factor that we’d say, this is why you’re seeing this Knowledge Panel at this time and not at another time.
Sometimes, with regards to this kind of queries, where you are seeing this change, it might be that it’s just on the border of, let’s show a Knowledge Panel or not. And then maybe the device type turns it over and is like, say, yes or no in the end. But that’s something where I don’t think there’s one specific factor that is involved with showing these or not showing these. We do use a variety of different sources for Knowledge Panels. And some of that you’ll see in the Knowledge Panel directly. So that’s one thing you can follow up on a little bit.
Another tip that I would give with regards to these things is there are some people externally from Google who spent a lot of time looking into the Knowledge Panels and how things are shown when Google picks things up. […] Jason Barnard is one of the people I know who does this well. He’s posting on Twitter all the time around Knowledge Panels. And maybe that gives you some ideas, as well, of what you could be looking at there.”
The number of questions to include in an FAQ list
40:41 “I have 15 to 20 FAQs on my web page. Should I include all the questions in the FAQ schema or just the questions that I consider important?”
According to John, “When it comes to structured data, we want to see the structured data visible on the page, but not all visible content has to be marked up with structured data. If you have individual pieces of content on your page that you want to give structured data for, then go ahead and do that. You don’t have to do that for every piece of content on your page. So if you have 20 FAQs and you mark up five of them, that’s totally up to you. You can even use the data-nosnippet to completely block some of these other items from appearing in a snippet if that’s something that you’d like to do.”
The Index Coverage issues
52:00 “One [issue] is Crawled ‒ currently not indexed and [the latter] is Discovered ‒ currently not indexed. And in both cases, the pages are not indexed. […] I know that Google does not index the whole content. […] What should I do to make these pages at least [indexed] faster, like linking from the home pages or linking from the pages, which are already ranking for some of the queries of my particular website? […] Could it be bringing more backlinks?”
John said, “I think all of those things kind of help. And it sounds you’re on the right track, and you know a little bit what to expect.
From our point of view, it’s the case that we don’t index content on all websites, and that’s expected from our side. So if you’re seeing a big part of your content already being indexed, I think you’re on the right approach. But it doesn’t mean that everything is perfect. And things like internal linking, making sure that the overall quality of the website is really good – those help a lot.
Sometimes, it might also make sense to look at the website overall and say, well, I have submitted 500 pages in my second file. And 200 of them are being indexed. What is the value of those 300 pages that are not getting indexed? And is there something that maybe I can do to go from having 500 random pages on a website? Maybe [it’s] reducing down to 300 really good pages on a website to concentrate the value into fewer pages? So that at least as those fewer pages get indexed, you get a lot of the value of those pages back, which could be that they rank for different keywords, or they work for the users that you care about most as a way of prioritizing on your side before you hand everything over to Google to do.
So that would be my approach there – on the one hand, making sure that you have everything lined up properly with internal linking and the overall website quality. And on the other hand, if you’re seeing that lots of your pages are not being indexed, trying to find a way to make it clear to Google which pages they should be prioritizing, which could be removing some pages that you don’t care about or that are not critical for your site.”