Ultimate Guide to Noindex Tag for SEO

policeman preventing access to the page

The noindex tag is used to prevent search engines from indexing a given page.

You may think that all pages on your website should be indexed, but that’s not the case. In fact, preventing certain pages from appearing in search results is integral to your indexing strategy. 

What is the noindex tag? 

The noindex tag is an HTML tag used to control the way bots treat a given page or file on your site and stop them from indexing that page or file.

You can tell search engines not to index a page by adding a noindex directive in a robots meta tag – simply add the following code to the <head> section of the HTML:

<meta name=”robots” content=”noindex”>

Alternatively, the noindex tag can be added as an x-robots-tag in an HTTP header:

x-robots-tag: noindex

When a search engine bot like Googlebot crawls a page with the noindex tag, it won’t index it. If the page was previously indexed and the tag was added later, Google will drop it from search results, even if other sites link to it.

Generally, search engine crawlers are not required to follow meta directives as they serve as suggestions rather than rules they must respect. Some search engine crawlers may interpret the robots meta values differently.

However, most search engine crawlers – like Googlebot – obey the noindex directive.

Noindex vs nofollow

There are other meta robots directives that Google supports – the most popular ones include nofollow and follow. However, the follow tag is the default setting if no robots meta tags are added, so Google considers it unnecessary.

The nofollow tag prevents search engines from crawling the links on a page. As a result, ranking signals of that page will not be passed to the pages it links to. 

It’s possible to use the noindex directive on its own, but it can also be combined with other directives. For instance, you can add both a noindex and nofollow tag if you don’t want search engine bots to index a page and follow the links on it. 

If you have implemented a noindex tag, but your page is still appearing in search results, it’s likely that Google simply hasn’t crawled the page since the tag was added. To request Google to recrawl a page, you can use the URL Inspection tool. 

When should you use the noindex tag?

You should use the noindex tag to prevent pages from being indexed by Google. 

Making less important pages non-indexable is crucial because Google doesn’t have sufficient resources to crawl and index every page it finds on the web. At the same time, you need to identify your valuable pages that should be indexed and prioritize their optimization.

Let’s see what types of pages you should implement the noindex tag on to make them non-indexable.

Place the noindex tag on:

  • Pages for products that are out-of-stock and won’t be available again.
  • Pages with duplicate content, often dominant on eCommerce websites. It’s also recommended to use canonical tags to point search engines to the primary versions of your pages and prevent duplicate content issues.
  • Pages that shouldn’t be accessible in search results, e.g., staging environments or password-protected pages.
  • Pages valuable to search engines but not to users – like pages containing links that help the bots discover other pages.  

Making pages non-indexable should be done as part of a well-established indexing strategy. 

You should never include noindex on valuable pages, like:

  • Most popular product pages, 
  • Blog articles (unless outdated), 
  • About me and Contact pages, 
  • Pages describing the services you offer. 

Generally, never place noindex on pages that you expect to generate significant organic traffic.

How to implement the noindex tag

The noindex tag can be placed in a site’s HTML code or HTTP response headers. 

Some CMS plugins like Yoast let you automatically noindex the pages you publish. 

Let’s go through the two primary implementation methods step by step and analyze their pros and cons.  

Insert the noindex tag into a page’s HTML code

The noindex tag can be implemented as a robots meta tag in the <head> of a page’s HTML. 

Robots meta tags are codes used to control a website’s crawling and indexing. Users cannot see them, but bots find them while crawling a page. 

Here is how to implement the code:

<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noindex" >
</head>
<body>
</body>
</html>

Let’s clarify how a robots meta tag is structured.

Inside a meta tag, there are pairs of attributes and values:

<meta attribute=”value”>

Robots meta tag has two attributes:

  • name – specifies the name of the search engine bots,
  • content – contains directives for bots.

Both attributes require different values based on what you want the bots to do. Also, both name and content attributes are non-case sensitive. 

The name attribute will typically take the value of “robots,” indicating that a directive targets all bots. 

It’s also possible to use a specific bot’s name instead, such as “googlebot,” though you will encounter this much less often. If you want to address different bots, you will need to create separate meta tags for each of them.

Keep in mind that search engines have different crawlers for different purposes – check out Google’s list of crawlers. 

Meanwhile, the content attribute contains the directive for the bots to follow. In our case, it is “noindex.” You can put more than one value there and separate the attributes by commas. 

Pros and cons of robots meta tags

The HTML method is easier to implement and modify than the HTTP header method. It also does not require you to have access to your server. 

However, implementing the noindex tag in your HTML can be time-consuming – you will need to add it manually to every page you want to noindex.

Add the noindex tag to HTTP headers

Another solution is to specify the noindex directive in an x-robots-tag. 

This is an element of an HTTP header response. HTTP headers are used for communication between a server and a client (a browser or search engine bot).

You can configure it on your HTTP web server. The code will look slightly different depending on what server you’re using – like Apache, Nginx, or others. 

Here is an example of what an HTTP response with an x-robots-tag can look like:

HTTP/1.1 200 OK
(…)
x-robots-tag: noindex
(…)

Apache server

If you have an Apache-based server and want to noindex all the files that end with “.pdf,” you should add the directive to the .htaccess file

Here is the sample code:

<Files ~ "\.pdf$">
  Header set x-robots-tag "noindex"
</Files>

Nginx server

If you have an Nginx-based server, implement the directive in the .conf file:

location ~* \.pdf$ {
  add_header x-robots-tag "noindex";
}

Pros and cons of using HTTP headers

One significant advantage of using noindex in HTTP headers is you can use it on web documents that are not HTML pages, such as PDF files, videos, or images. Moreover, this method lets you target a particular part of the page.

Additionally, x-robots-tag supports the use of regular expressions (RegEx). In other words, you can target the pages that should be noindexed by specifying what they have in common. For example, you can target pages with URLs that contain specific parameters or symbols.

On the other hand, you need access to your server to implement an x-robots tag.

Adding the tag also requires technical skills and is more complicated than adding the robots meta tags to a website’s HTML. 

How can you check your implementation of the noindex tag?

If you want to check whether noindex or other robots meta directives are implemented, you can do it based on how they were added to a page. 

So, if the noindex tag was added to a page’s HTML, you can check its source code, while for HTTP headers, you can use the Inspect option in Chrome. These tools will show you which directives were recognized on a given page. 

Other options include inputting a URL into Google Search Console’s URL Inspection tool or using the Link Redirect Trace extension.

NEXT STEPS

Here’s what you can do now:

  1. Contact us.
  2. Receive a personalized plan from us to deal with your indexing issues.
  3. Enjoy your content in Google’s index!

Still unsure of dropping us a line? Read how technical SEO services can help you improve your website.

More information on using the noindex tag

Here are some additional guidelines on using the noindex tag and details about its characteristics:

  1. Whenever you don’t include noindex in your code, the default option is that bots can index your page.
  2. Watch out for any mistakes in the code, such as including commas in the right places – bots won’t understand your commands if the syntax is wrong.
  3. Add the tags in your HTML code or HTTP response headers, but not both. Doing it can have a predominantly negative impact if the directives in respective places contradict each other. In this case, Googlebot will choose the directive that limits indexing.
  4. You can use a noimageindex directive which will work similar to noindex but will only prevent the images on a given page from being indexed.
  5. After a while, bots start viewing noindex as also nofollow. Many people disable the indexing of pages using noindex but combine it with the follow directive to ensure robots still crawl the links on a page. But Google has explained that a noindex, follow directive will eventually be treated as noindex, nofollow because at some point, they stop crawling the links on noindexed pages. As a result, the link destination pages may not be indexed and can get diminished ranking signals which may negatively affect their ranking.
  6. Don’t use noindex in robots.txt files. Though this and some other rules were not officially supported, search engine bots followed noindex directives in robots.txt files. However, as of September 2019, Google announced that it had retired the code that handled unsupported and unpublished rules in robots.txt files – such as noindex – in September 2019.

Comparing noindex tags, robots.txt files, and canonical tags

noindex tags, robots.txt files, and canonical tags are related – they can be used to control the crawling and/or indexing of pages

However, they have some distinguishing characteristics that make them suitable in different situations.

We have established that noindex tags control whether specific pages on a website should be indexed, and they operate on a page level.

Let’s look at how this compares to robots.txt files and canonical tags.

Robots.txt files

Robots.txt files can be used to control how search engine bots crawl parts of your website on a directory level. 

Specifically, robots.txt files include directives for search engine bots, focusing on either “disallowing” or “allowing” their behavior. If bots follow the directive, they won’t crawl the disallowed pages, and the pages won’t be indexed. 

Robots.txt directives are widely used to save a website’s crawl budget. 

Be careful when implementing noindex tags and setting up the rules in robots.txt files. For a noindex directive to be effective, the given page needs to be available for crawling, meaning that it can’t be blocked by the robots.txt file. 

If the crawler can’t access the page, it will not see the noindex tag and won’t respect it. The page can then be crawled and appear in search results – for instance, if other pages are linking to it. 

To noindex a page, allow crawling it in robots.txt and use a noindex meta tag to block its indexing – Googlebot will then follow the noindex directive.

Canonical tags

Canonical tags are HTML elements that inform search engines which page out of several similar ones is the primary version and should be indexed. They are placed on secondary pages and specify the canonical URL – as a result, these secondary pages shouldn’t be included in the index. 

Canonical tags may limit the indexing of pages that aren’t canonical, but Google won’t always respect these tags. For example, if Google finds more links to another page, it may treat it as more important than the specified canonical URL and consider it the primary version. 

Also, canonical tags can be discovered by bots only during crawling. Unlike robots.txt files, they cannot be used to stop a page from being crawled. 

A vital difference between canonical tags and noindex tags is that canonicalized pages consolidate ranking signals under one URL. Meanwhile, noindexed pages won’t pass the ranking signals, which is vital concerning internal linking – they won’t pass ranking signals to the URLs they link to.

Wrapping up

Making low-quality pages non-indexable is one of SEO best practices for optimizing your indexing strategy – and using the noindex meta tag is one of the most optimal ways to keep a page out of Google’s index

Using the tag, you can block the indexing of unimportant pages and subsequently help search engine crawlers focus on your most valuable content. 

This makes the noindex tag one of the essential tools in SEO, and it’s why we audit all your noindex tags as a part of our technical SEO services.

Your website’s efficient crawling and indexing are key to making the most out of the organic traffic that valuable pages can drive to your site. To learn more about the process of indexing, make sure you read our guide to indexing SEO next!