Soft 404 is not an official HTTP status code, but an advanced algorithm that helps Google detect if a page doesn’t exist (or has little to no content) even if the HTTP status code doesn’t indicate it.
However, the algorithm is not perfect, and Google might make a mistake while classifying a page.
No matter the reason for soft 404s, they negatively impact your website and decrease your visibility in organic search.
From this article, you’ll learn how soft 404s affect your site, what can cause them, and what you can do to fix them.
What is a soft 404?
Servers communicate with clients (e.g., browsers or search engine bots) via HTTP status codes.
If the request for a page is successful, the server returns a 200 HTTP status code. If the page is missing, the server responds with a 404 (Not Found) status.
When users request a page that doesn’t exist, they see a message in their browser indicating that something went wrong. However, the message the browser displays doesn’t always correspond with the HTTP status code.
That’s where the soft 404 comes into play.
Soft 404 is a label Google gives a page when it seems like the page doesn’t exist, and at the same time, it returns a successful 200 HTTP status code.
If Google decides a page is a soft 404, it slows down its crawling.
If we see it [a page] as a soft 404, it would be like a 404, and we would slow down crawling of that particular URL because there’s nothing here – why do we have to crawl it every day?source: John Mueller
Detecting soft 404s is essential from the search engine’s perspective for two reasons:
- Google has limited resources. The web is infinitely big, and it’s simply impossible to crawl all pages. That’s why Google needs to prioritize and pick what pages are worth crawling. Leaving out the soft 404s allows it to focus on more valuable pages, thus increasing its crawl efficiency.
- Google wants to present quality pages to its users. If Google suspects a page doesn’t exist, it’s obvious nobody wants to find it, and it shouldn’t appear on the search results page.
To avoid sending a negative quality signal to Google, contact us and get your 404 pages audited as a part of our technical SEO audit.
How Soft 404s affect your website
The consequences for your website may vary depending on what type of pages Google classified as soft 404s.
If Google was correct and the page really doesn’t exist, the main consequence is wasting your crawl budget.
Your crawl budget indicates the number of pages Google can and wants to crawl on your website. If you have, e.g., 100,000 pages, and your crawl budget allows for crawling 50,000, it’s essential to ensure that the budget is spent on valuable pages. If Google wastes your crawl budget on crawling soft 404s, there might not be enough for pages that matter the most to you and bring you traffic.
The other side of the coin is when Google makes a mistake while assigning the status and thinks a valuable page is a soft 404. In this case, the page won’t be indexed and won’t bring organic traffic.
How can you detect soft 404s?
You can check which pages Google reports as soft 404s in Google Search Console in the Index Coverage report.
Access the report by clicking the “Coverage” option on the sidebar.
If Google thinks a page is a soft 404, it can assign it one of these two statuses:
- Soft 404 (Excluded category), or
- Submitted URL seems to be a Soft 404 (Error category).
The only difference between these statuses is the way Google discovered the URL.
In the case of the “Submitted URL seems to be a Soft 404” status, Google found it inside your sitemap (text file created by website owners that should list only the pages you want to be indexed). Meanwhile, in the case of the “Soft 404” status, Google found this URL on its own.
You can see a list of individual URLs reporting soft 404s by clicking on either of these statuses. This data is available for export, but there’s a 1,000 URLs limit. If you want to export more and have more than one sitemap, you can download URLs for each sitemap separately.
The Index Coverage report is not the only place where you can see the status of a URL.
In the URL Inspection tool in Google Search Console, you can double-check individual URLs. If you want to inspect more than one URL, you can use the URL Inspection API and check up to 2,000 URLs at once.
If you see a difference between statuses in the Index Coverage report and the URL Inspection tool, it might be just a delay in the Index Coverage report. In this case, trust the URL Inspection tool as it shows more recent data.
This is because the Index Coverage report data is refreshed at a different (and slower) rate than the URL Inspection. The results shown in URL Inspection are more recent, and should be taken as authoritative when they conflict with the Index Coverage report. (2/4)
— Google Search Central (@googlesearchc) October 11, 2021
Soft 404 detection on mobile vs. desktop
In 2021, Google gave an update on how it detects soft 404s on mobile phones and desktop devices.
It turned out that the status might be assigned differently to the mobile and desktop versions. However, because Google Search Console reports statuses based on the mobile version, it won’t show you if only your desktop version is labeled as soft 404.
Essentially, what happens is that sometimes we see pages that on desktops look like a 404 page, so we say this is a soft 404 on desktop, we don’t need to index it. And on mobile, it looks like a normal page, so we will actually index it there.
[…] in Search Console, we do show soft 404s, but we show it for the mobile version. So if on the mobile version everything is okay from your side, then in Search Console, it will look like it’s indexed normally […], whereas for desktop, if we see it as a soft 404 there, you won’t be able to see that directly in Search Console.source: John Mueller
What can cause a soft 404 and how to fix it
There are a few different reasons why Google might classify a page as a soft 404, including:
- 404 page responding with a 200 HTTP status code,
- Irrelevant redirects,
- Pages with little or no content,
- Pages containing 404-like words,
- Rendering issues.
404 page responding with a 200 HTTP status code
If a page is, in fact, a 404 page, but it returns a 200 HTTP status code, Google will classify it as a soft 404.
This is something to be especially mindful of if you have a custom 404 page.
A custom 404 page can be helpful to your users and allow them to explore the website even though the page they were trying to reach doesn’t exist. However, it’s not uncommon that these pages return a 200 HTTP status code.
You should avoid this situation because Google continues to crawl these pages, which wastes your crawl budget.
The solution to this problem is to configure your server to return the correct status code for pages that don’t exist (404 Not Found).
Redirecting to an irrelevant page is a bad practice that might confuse users. That’s why if Google detects that a redirect is pointing to an unrelated page, the search engine might not follow it and treat the page as a soft 404.
Yeah, it's not a great practice (confuses users), and we mostly treat them as 404s anyway (they're soft-404s), so there's no upside. It's not critically broken/bad, but additional complexity for no good reason – make a better 404 page instead.
— 🌽〈link href=//johnmu.com rel=canonical 〉🌽 (@JohnMu) January 8, 2019
To resolve the problem, always redirect to relevant pages.
Look at the content from the users’ perspective. For example, if a user was looking for something specific, would it make sense for them to end up on the page you’re redirecting to? Is it thematically relevant? If not, maybe there’s a better page that could answer their intent, or perhaps you should set up a 404 page instead of a redirect.
Pages with little or no content
Little or no content on a page might make Google think the page is empty and classify it as a soft 404.
An example can be an eCommerce website with products frequently going in and out of stock, leading to empty product categories.
The solution to this problem is not as straightforward as in the two previous cases.
One way to deal with that issue is to block the indexing of empty pages. After all, if it’s an empty page, it’s not helpful to your users, and it shouldn’t be indexed. You can do it by adding a noindex meta tag (an HTML tag telling search engines that you don’t want this page to be indexed).
Additionally, it’s worth rethinking the structure of your whole website.
Do you have a lot of product categories that have, for example, only one product? If that’s the case, you should reconsider if these categories are even needed on your website. Pages like this might be considered thin content, and they can negatively affect your website in two ways:
- They can waste your crawl budget, and
- If you have a lot of low-quality, indexable pages, Google might think that your whole website lacks quality and decide to stop crawling your website as often.
If you want to learn more about which pages should and shouldn’t be indexed, read our article on creating an indexing strategy for your website.
Pages containing 404-like words
Sometimes Google’s algorithms misidentify a page if it contains words that usually appear on a 404 page. It might happen on, e.g., eCommerce websites when a product page uses terms like “out of stock,” “product unavailable,” or “we don’t deliver to your location.”
All Category pages had "Sorry we don't deliver to this location". This was shown to customers entering a PIN code which we don't deliver to, but was part of the page by default. Removed this text from the page and that fixed the soft 404! #seo @JohnMu @methode @rustybrick https://t.co/j3UEsXXb3U
— Nikhil Raj. R (@nikhilrajr) December 30, 2021
The author of the above post fixed the problem by simply deleting the words indicating the delivery is not available.
Rendering is a necessary step for Google to see your content. If the search engine can’t see it, it might think the page is empty and classify it as a soft 404.
To find out if Google renders your content correctly, use the URL Inspection tool in Google Search Console. You can inspect individual URLs and see how Google sees your pages. If the content is missing, it indicates a rendering issue.
Monitoring soft 404s is important to ensure they don’t hurt your website by wasting your crawl budget or leaving valuable pages out of the index.
Here are the key takeaways from the article to help you avoid soft 404s:
- If a page doesn’t exist, ensure it returns a 404 HTTP status code,
- When creating a redirect, always ensure you’re redirecting to relevant content,
- If you have empty pages, add the noindex meta tag or remove these pages from your site,
- Be mindful about using 404-like phrases. If you notice your page, with, e.g., out-of-stock product, is marked as soft 404, try removing the words or using different terms.