Website downtime vs. ranking
04:17 “One of my client’s websites will be down for a week or two. […] How can I tell Google that this is a temporary situation? […] Can I tell Google that this website is currently down, but it will be just going back to life again within two weeks or a week? But there shouldn’t be any ranking loss, or there could be a minimum ranking loss that I could get?”
John replied, “I don’t think you’ll be able to do it for that time, regardless of whatever you set up. So for an outage of maybe a day or so, using a 503 result code is a great way to tell us that we should check back. But after a couple of days, we think this is a permanent result code, and we think your pages are just gone, and we will drop them from the index. And when the pages come back, we will crawl them again, and we will try to index them again. But […] during that time, we will probably drop a lot of the pages from the website from our index. And there’s a pretty good chance that it’ll come back in a similar way, but it’s not always guaranteed.
So any time you have a longer outage, I’m thinking more than a couple of days, I would assume that at least temporarily, you will have really strong fluctuations, and it’s going to take a little bit of time to get back in. It’s not impossible because these things happen sometimes. But if there’s anything that you can do to avoid this kind of outage, I will try to do that. And that could be something like setting up a static version of the website somewhere and just showing that to users for the time being. But especially if you’re doing this in a planned way, I would try to find ways to reduce the outage to less than a day if at all possible.”
304 response code vs. crawling
11:48 “Do you think the 304 response code affects crawling? Because logically, if Googlebot checks a URL with the same content and it returns 304 code the first time, maybe there’s a possibility that Googlebot may reduce the crawling for the same URL because it returns to the 304 code.”
John said, “I think there are two things. So the 304 is, I think, in response to the “If-Modified-Since” requests where Googlebot tries to see if this page has changed. And my understanding is that a 304 response code would not apply to the crawl budget side of things. So that means for us, we can reuse that request and crawl something else on a website. So there’s that aspect.
And the other aspect with regards to crawling that specific URL less, I don’t think that would be the case. But we do try to figure out how often pages change, and we try to recrawl pages based on the assumed page frequency or update frequency that we have. So it’s not so much that particular URL would get crawled less frequently. It’s more that we understand a bit better how often these pages change. And then based on that, we can update or refresh crawling a little bit.”
13:29 “So if most of the pages on the site return to 304, so maybe it’s a signal for Googlebot that the site has no new updated content, [and to] reduce the crawling rate?”
John: “No, I don’t think so. I don’t think we would reduce the crawling rate. We would try to focus more on maybe the parts where we do see updates happening. So I would not artificially hide the 304s in the hope that it improves the crawling.”
Crawl requests from mobile vs. desktop
14:06 “Since our crawling rate is back to normal, we noticed that our crawl requests from smartphones are recovering much faster than desktop. Could you shed some light on it?”
John replied, “I don’t know. It sounds like what would be expected with mobile-first indexing, that we crawl a bit more with mobile. I don’t know if your specific site has already moved to mobile-first indexing, but then that would be normal, that we crawl more with mobile, and then you would see any changes faster there.”
Discovered/Crawled ‒ currently not indexed report
26:12 “We’re getting Discovered ‒ currently not indexed rather than Crawled ‒ currently not indexed for 99% [of pages]. Should we differentiate between those two? Because our site is not that big and this can’t be a crawl budget issue in my view. In that case, are those two designations pretty much the same in that it’s just a quality issue?”
John answered, “I don’t know your website, so it’s hard for me to say offhand. But if it’s something where you’re seeing the clean URLs being listed in the Discovered, [‒ currently] not indexed report, essentially the URLs that you do want to have indexed, then that sounds like it’s less a matter of Google can’t go off and crawl that many URLs. Because, again, with 25,000 pages, most servers that are reasonably sized can easily allow that crawling on a regular basis. And it’s probably really more a matter of our understanding of the overall website quality.
And with larger websites or if in the Discovered, [‒ currently] not indexed report, you see that there are lots of different variations of URLs, like with parameters or with upper or lowercase, […] that can be a sign that the internal linking is messy and that we’re having trouble finding the right URLs to crawl. But if we’re showing the right URLs in the Discovered, [‒ currently] not indexed report and it’s a reasonably small website, then to me that more points in the direction of the overall site quality.”
Resolve your doubts and contact Onely for thorough internal linking optimization.
27:54 “So do you think we should try to add text in there? What we show is a directory of companies, and we show what a stock price means in terms of the future growth of that company. So it’s a number, but there’s not a whole lot of readable text that goes with it. […] We do have a description, but it’s common to all those companies, and we’ve got to figure out what to do if we were to squeeze in unique text per each company. But should we head in that direction, do you think?”
John: “I don’t think the text will affect how we index the pages. So from that point of view, it’s something where if you see the text affecting how users look at your pages and are able to interact with your pages, and then sure. But that’s more a matter of trying to figure out what users are looking for and where you can provide unique value to your users. But just adding text to pages ‒I don’t think [that] would affect how we crawl and index those pages.
If it’s something where you’re providing numbers, like the stock numbers there, that’s something where I would also try to figure out what you can do to make sure that what you’re providing is unique and provides value to users. Do something maybe along the lines of a user study to figure out, what is it that we can do to make our website such that users recommend it to other people as well? And that it builds up almost like, I don’t know, trust or something from the user point of view. And a lot of times, those are not purely technical things that you change on a website, where you change a design, or you convert some of the numbers into text, for example. It’s a matter of the overall setup of the website.”
Indexing of m-dot websites
30:20 “Does Google have any troubles with indexing sites that have mobile versions on a subdomain? For example, example.com and m.example.com?”
John said, “From our point of view, at least as far as I know, we don’t have any problems with m-dot domains in general, in the sense that this is one of the supported formats that we have for mobile websites. We don’t recommend the m-dot setup. So if you’re setting up a new website, I would try to avoid that as much as possible and instead use a responsive setup, but it is something that can work.
So if you’re seeing this regularly with your website that we’re not able to index your mobile content properly, then to me, that would point more at an issue on your website when mobile Googlebot is trying to crawl, it’s not able to access everything as expected. So that’s the direction I would head there to try to clean that up.
The one thing that throws people off sometimes with m-dot domains is with mobile-first indexing, we switch to the m-dot version as the canonical URL, and it can happen that we show the m-dot version in the desktop search results as well. So you also need to watch out for not only redirecting mobile users from the desktop to the mobile version but also redirecting desktop users from the mobile to the desktop version.
And again, […] if you have a responsive design setup, you don’t have to worry about that. So it’s another reason to go responsive if possible.”
Dealing with outdated blogs
41:03 “We have about 450 blogs, some of which are four to five years old and therefore out of date and have almost no traffic on them. Do you recommend deleting them because they hurt our general search rankings? What’s the best way: delete all without traffic at once and request index deletion at Google, or do you recommend step by step approach?”
John replied, “I think with blogs, you probably mean blog posts, so individual pages, not whole sets of pages. Because I think if you have so many different sets of pages, it’s probably a bigger change. But with 450 pages, […] where you’re saying, well, these don’t get a lot of traffic, should I delete them or not? From my point of view, probably that’s something where you can make that call on your own. I don’t see that as being something where from an SEO point of view, you would see a significant change unless these are terrible blog posts.
The main thing, however, I would watch out for is that just because something doesn’t have a lot of traffic doesn’t mean that it’s a bad piece of content. It can mean it’s something that gets traffic very rarely, maybe once a year, maybe it’s very seasonal. Overall when you look at it from a website point of view, it’s not very relevant, but it’s relevant maybe right before Christmas, for example. So from that point of view, I would say it’s fine to go through a website and figure out which parts you want to keep and which parts you want to clean out. But just purely looking at traffic for figuring out which parts you want to clean out, I think that’s too simplified.
But again, from an SEO point of view removing 450 pages from a larger website, that’s a tiny change, and I wouldn’t worry about when you do that and how exactly you do that. Delete them whenever you recognize that they’re no longer valuable. Delete them all at once, that’s also an option.
With regards to submitting them with the removal tool as well in Search Console, that probably wouldn’t change anything because the removal tool in Search Console hides the page in their search results, it doesn’t remove anything from indexing. So that’s one thing you don’t have to do. But again, otherwise, I would think about which pages you want to keep, which ones you want to remove, and go through it like that.”
Are your pages “Blocked by page removal tool” in Google Search Console?
Read my article to learn how the URL removal tool works in the wild and how to fix this issue.
Indexing issues of new articles
54:55 “I have a small website, and it’s a couple of hundred URLs only. […] And it has been going well for a long time. And suddenly, in November, the published pieces are not indexed anymore, not all of them. […] We’re sitting there and seeing that Google is crawling them or discovering them […]‒ but [they are] not indexed. So I tried everything: I looked at the technical issues, the linking is good. So my question is, is there a paradigm shift that Google is saying, well, thanks for publishing these articles, but we don’t want it right now? Is this something new that has changed recently?”
According to John, “Not really, at least not that I know of. I mean, I think what I see a lot with regards to the indexing questions that I get nowadays is from a technical point of view, it’s very easy for websites to make websites that just work. You set up WordPress, and then essentially all of the SEO is done for you. And from our point of view, this means that it’s less often a case that there is a technical problem with a page that doesn’t get indexed. That means all of the content that we get is essentially technically OK, and our systems have to be a lot more critical with regards to the overall quality of the website, the quality of the pieces of content that we get. And then it’s something where additionally in Search Console, we give you all of the information on things like Discovered, [‒ currently] not indexed or Crawled, [‒ currently] not indexed. And then suddenly you see all of these problems, and it seems like something that people have to fix. From that point of view, it feels like it’s normal for us to get a lot more of these indexing questions just because, well, a lot of content is OK, and we still can’t index everything on the web, so we have to make a cut somewhere.”