00:44 “Is there a different value of internal links in a header, footer, or in content?”
According to John, “It’s pretty similar. I don’t think there’s anything quantifiably different about internal links in different parts of the page. I think it’s different when it comes to the content in different parts of the page where we try to figure out what is unique to a page. But with regards to links, I don’t think that’s anything.”
03:33 “After November  Google [Core] update my website [has some issues with] crawling. Some links [have been crawled, and] some are not. […] How can I fix [it]?”
John said, “I think there are two possibilities. One is that maybe there’s a technical issue. I don’t think that’s necessarily the case […] because it sounds like some of the pages are being crawled normally.
The other is that we don’t crawl everything all the time. We don’t index everything on the web, and sometimes we have to prioritize things. […] We try to understand what is the overall value of a website in terms of how much resources we should spend on the website. And that is reflected in how much we crawl as well. That might be something that you’re seeing where our algorithms are not sure of the overall quality of the website. Helping to improve the quality of the website usually ends up making it so that we crawl more of the website as well.”
Drop in the number of indexed pages
05:47 “Over the last year, we’ve been making a lot of technical improvements to the site, and our customers seem to be happy with the site. However, since the end of October, the number of pages indexed by Google has dropped dramatically by 25 percent [which is] about 500,000 pages. The ones that we’ve submitted […] have dropped by over 50 percent. […] The thing that we found is that […] if there are no reviews on the product page, the schema validator is unhappy because there’s no review mentioned. […] Is there something that we’re missing […] or is that actually enough to have been the core cause of it?”
John answered, “Just because structured data is not completely valid on a page wouldn’t mean that we would drop it from indexing, so that seems unrelated to me. I imagine the report in Search Console shows all of these errors. You look at them, and you say, well, I don’t care about the markup there. And that’s fine. It’s not a sign that we think your website is bad because the structured data isn’t valid. It’s just we want to let you know, in case you wanted to use this structured data, it’s not working. But that wouldn’t affect crawling, indexing, or ranking.
It’s hard to say, offhand, what might be causing that. It could be […] that our systems are unsure about the quality, overall, of your website. When it comes to such a large website, where you are looking at the mass of numbers there, one thing I would also do is try to look at some samples and try to see, is the number really reflective of an actual issue? Or is the number of indexed pages essentially reflective of something technical that is being cleaned up?
For example, sometimes, we index pages with different parameters attached to them, like Analytics tracking parameters. It can easily happen that we suddenly index 100,000 of those pages. They’re all indexed. And in the graph, it looks like that’s a big thing. But if we were to drop all of those pages, it wouldn’t change anything for your website because these are accidentally indexed pages. So in the graph, that could look very dramatic, and that it goes up, and all of these things are indexed, and then it goes down. […] But it might just be that our systems are fixing an issue with regards to indexing that doesn’t affect the rest of your website. What I’d try to do is figure out which of these issues are affecting the traffic or the visibility of your website. Then maybe the indexing issue is something that falls into that, but I would try to separate that out.”
09:13 “One thing that we noticed, it was the first time we’ve ever seen Crawled [– currently] not indexed. […] We feel this is telling us something, but we’re not quite sure how to interpret it.”
John: “I don’t think there’s much you can kind of pull out from that. The two statuses, Crawled [ ‒ currently] not indexed and Discovered [ ‒ currently] not indexed, they’re essentially equivalent in that we know about the URL. We confirmed that we’ve heard about it, but we decided not to index it. That’s something where we’re looking with the indexing teams to figure out, is this a general problem? Because we hear more and more reports about this. Or is it essentially more visible than it used to be? Because even in the past, we would always only index a portion of the website. But we never showed people that in Search Console. We focus on the traffic that you’re getting, not why we’re not indexing individual pages.”
Deindexed pages vs. special characters in URLs
23:56 “We just found that from January 13th, our indexed pages dropped over 90 percent. […] Can you give us some recommendations on which aspects we can figure out to identify the problem? […] When we checked the samples, we noticed that the URLs [that Google crawled] have some unusual marks like question marks [and] some plus marks in the URL, but our actual URLs don’t have these marks. That’s one thing unusual being spotted.”
John’s response was, “I think the one aspect that you probably also want to check is whether or not we can crawl them properly. I imagine you already looked into that, but it’s always good to double-check there.”
When it comes to the special characters in the URLs, John added, “What always happens is we discover a lot of URLs for websites. If we don’t think that they’re important, we will keep them on our list, and at some point, we’ll try to crawl them. I suspect these are just random URLs that we discovered over time. We try to crawl them from time to time to see if there’s anything that we’re missing, but it’s not a sign of a problem of a website if we also crawl some random URLs.”
And referring to the technical aspects that might result in such a situation, John said, “Usually, the main issue is about the overall quality of a website which goes into the decision of whether or not to index individual URLs. That’s something that can also change over time. Not so much that the quality of your website changes, but our perception of the quality of the website can change over time. And that’s usually the main element that comes into play there.
If you see these indexing changes happening over a short period of time, then it could be that our systems have just changed the way that we evaluate quality for your website, and, suddenly, everything is in a slightly different bucket. Whereas if you see them over a longer period of time, then […] over time our systems are less and less confident about the website.”
GSC properties and the non-trailing indexed pages
33:18 “We’ve been trying to create GSC properties for some of our country-specific folders to better monitor their performance. We don’t use trailing slashes on our URLs. So when a new folder property is added to GSC, the trailing slash is automatically added to the address, and no data is captured and reported for the non-trailing version of the index page. Is there any way to add a folder as a GSC property and capture the stats for the non-trailing indexed page as well?“
John: “No, currently not. From our point of view, a page without a slash at the end is just a page. If it has a slash, then it’s a folder, that’s the model that we’ve used for Search Console. So if you have the home page of one section of your website and it doesn’t have a trailing slash, then we would see that as a page within the higher-level site. On the domain level probably, you would see all of these. If you want the data visible independently, you have to pull that out from the higher-level property in Search Console.”
Recovering from website’s downtime
34:29 “My website, averaging around 200,000 sessions a day, was hit by a technical problem. The site was down for 14-15 hours just two days ago. While yesterday’s traffic was roughly normal, today, a lot of our pages have gone missing from Google searches. The site has been stable for the last 8 years, and we’ve never had a problem like this before. What do you recommend?“
John said, “Usually, if you have this kind of technical issue for a short period of time, it can happen that these pages will drop out of our index, and usually they will pop back in fairly quickly as well. What usually happens is the pages that we crawl more often probably get picked up first and get noticed during this technical issue. Maybe we drop them during that time. So you probably see that reflected in your traffic as well, but the good news is that these pages also tend to be recrawled fairly frequently, so they [also] should pop back in fairly frequently.
The best way to protect against this issue is to make sure that you have some system in place that can serve a 503 result code when things go wrong. It might be that it doesn’t trigger automatically, but even if you can manually turn this 503 result code on, essentially what happens then is, when we crawl the pages during that time and see the 503, then we will say there’s a problem here. We will ignore it and come back later to double-check.
Essentially if you can serve a 503 result code, for a period of a day or two, then we will see that as a temporary glitch, and we will not drop these pages from our index because we think that they still exist. Whereas if you serve a 404, or if you serve an empty page or just an error page, directly, then we might assume that this page has gone, and we’ll drop it from the index.
That would be my recommendation. Oftentimes, you can’t just jump in when things go down and suddenly figure out how to do a 503. So I would prepare that system ahead of time so that you can switch over as quickly as possible. […] If you can serve a 503 for a day or two, then you should not see any changes in your search indexing at all. If it’s longer, then obviously you could still, but at least for those one day or two ‒ you’re protected.
In the case that you can’t do that as you did here, I would assume that this will come back automatically. I don’t think there’s anything manual that you need to do. We will recrawl these pages. We’ll notice there’s good content there again. We’ll index them again, […] pick up the signals that we had before. It should essentially be indexed and ranked similarly to before. There should not be any long-term issue here.”
38:37 “We want to migrate the content of one website to two separate domains and split it. What should we do in the old domains’ GSC? Which domain should we point to as the recipient? How to notify Google about that?”
John said, “In a case like this, where you’re splitting or merging websites, you can’t use the Change of Address tool in Search Console, because it relies on the fact that the move is a one-to-one move from one domain to another domain. As soon as you’re splitting or merging websites, then that’s not a one-to-one move anymore, that’s essentially something that has to be processed on a per URL basis. So for these things, essentially, what you want to do is just set up redirects properly. Follow the normal guidelines that we have for site moves and keep in mind that the Search Console setting for Change of Address is probably not suitable there.
Also, the Search Console setting will try to test some sample pages on your site for that redirect. It might be that it looks like everything is okay, but I think it would still be wrong to use that setting if you’re splitting a website up. Just because it could potentially mess up signals a little bit, I doubt that it would cause problems, but I don’t think you would have any advantage of using that Change of Address tool if you’re not moving from one domain to another.”
51:16 “Does it make sense to look at the internal links from important pages of a website to see if they have links from other internal important pages and […] to [remove] links to less important pages so that the links to the important pages have more weight?”
John answered, “It’s something you can do. It’s a bit tricky because we try to be smart with how we process internal links. Especially some very common pages that get a lot of links, like an About Us page or Terms of Service, are linked from across the whole website. But at the same time, we understand this is a pattern that is normal, and it doesn’t mean that we should rank the Terms of Service page for anyone who’s searching for the company name. It’s something where, on the one hand, the internal linking is something that you can control. But I wouldn’t go overboard and say, well, I remove links to pages that I don’t think are critical. Because that’s especially something that happened when we introduced the nofollow that people would say, oh, my Terms of Service [page] ‒ all links will be nofollow to it. That doesn’t change anything. It’s a lot of work, and you have to maintain it forever, but it doesn’t change anything for your website, so it’s like wasted work.
But I would still recommend going through your website and trying to create a graph of how things are linked. I think some or probably most SEO tools have some capability for doing that for crawling the website and creating this graph […] to show the structure of the website. And when you look at that, sometimes you can tell at first glance is there a clean structure or is it completely messy? If it’s completely messy, then I think there is room to clean that up and to make it clear what the structure should be.
By making a clearer structure, you are helping us to understand which pages you think are more important, so that’s something I would try to find ways to clean up. It’s not that I’m saying your website will rank better if you have a clean structure, but it’s more if we understand your website should rank in this range [and] which of these pages are the most important ones. That’s something that you’re telling us there […] and that gives you value and that you’re sending people to the pages that you care about. That’s certainly something I would look into doing.”
55:03 “What about the internal PageRank which is quite easy to calculate? Would you recommend doing that to see which pages do have the most weight from internal links, or would you say this is something yeah that’s not necessary?”
John replied, “[…] The aspect that you can’t model in there is that individual pages will get different external links, and that essentially affects the internal PageRank as well. If everyone is linking to your Terms of Service page, then it’s suddenly something that has a lot of PageRank. And PageRank is something that we use in our systems, but we use lots of other things. It’s an interesting gadget from a technical point of view, but I wouldn’t see it as something that is supercritical from a practical point of view. It’s more you’d like to mess with numbers and play with graphs ‒ sure you can calculate this. I wouldn’t see it as something that is reflected one-to-one at Google.”