A sitemap is not necessary for your site to function, but adding one can positively impact the crawling and indexing of your website by search engines.
On the other hand, a poorly optimized sitemap can negatively affect your crawl budget and put you at risk of search engines overlooking your valuable content.
This guide will help you understand what sitemaps are, what to include in them, and why you need one.
What is a sitemap
An XML sitemap is a text file that lists URLs on your website. It serves as a digital map for search engine bots and helps them find the valuable pages you want search engines to index.
Sitemaps have their own URLs, and they can be placed anywhere on your site’s server. However, they affect only descendants of the parent directory. So to affect all of the pages, you need to add the sitemap to your root directory:
The link to your sitemap should be included in your robots.txt file. To do it, use the following directive at the beginning or the end of your file:
You don’t necessarily have to put a sitemap in the robots.txt file, but it will help most bots find it, including search engines other than Google and Bing. For example, both Seznam and Yandex can read sitemap directives from robots.txt.
Why sitemaps are useful
Having a sitemap comes with many benefits for your website. First and foremost, it helps search engines find content to index.
In the ideal world, well-designed site architecture should let users and search engines reach all your pages without a problem.
Unfortunately, a website structure can be complicated and doesn’t always make it easy for search engine bots to find all your pages.
A sitemap presents the URLs in a straightforward format bypassing the need for crawlers to follow links on your site, which makes it easier for search engines to discover all important pages on your site.
- Including a page in a sitemap doesn’t guarantee that it will get indexed, but it can speed up the indexing process and make it more reliable on your end.
- A sitemap helps optimize the use of your crawl budget. Without it, search engine bots need to crawl your entire website to find fresh, indexable content. As a result, they might waste the crawl budget visiting low-quality pages and overlook some more valuable ones.
- When you add a sitemap to Google Search Console, you can get feedback about the URLs in your sitemap. So if there’s a problem with a page and Google can’t crawl it, you will know about it by looking at the Coverage report in Google Search Console, and you’ll have the opportunity to take action.
Who needs a sitemap
An XML sitemap can help any website, and every website should have one just to be safe. Still, it may be more beneficial for some than for others.
A sitemap is an absolute must if:
- Your website has a lot of dynamic content. If you update your pages frequently, there’s a risk that search engine bots might miss some of your new or updated content.
- You have a large website (over 500 pages). The bigger your website is, the bigger the risk that search engine bots might overlook some pages.
- You have a new website. Unfortunately, new sites usually have little or no external links coming to them. As a result, crawlers may have a hard time finding them.
- You have isolated or poorly internally-linked pages. If search engine bots can’t discover your pages by following links, they might not find all of them.
- You have a lot of rich media content (images, videos). Sitemaps allow you to provide additional information about your visual content for search engines (e.g., video running time, image object matter).
What to include in a sitemap
Not all of your pages should make it into your sitemap. If you put all of them in, you risk wasting your crawl budget on crawling low-quality pages. This can lead to high-quality pages on your site that remain unindexed because search engines didn’t have the resources to crawl them.
That’s why it’s so important to ensure you only include indexable pages with your most valuable content.
Make sure that the pages you include in a sitemap:
- Respond with a 200 code,
- Are not blocked by robots.txt,
- Don’t include a noindex meta robots tag,
- Are the canonical version of a page.
Additionally, here is a list of pages that should not end up in your sitemap:
- Pages that have thin or duplicate content,
- Paginated pages,
- Parameter or session ID-based URLs,
- Site search result pages,
- Archived pages.
Here’s an example of a sitemap with two URLs:
<?xml version=”1.0” encoding=”UTF-8”?> <urlset xmls=”http://www.sitemaps.org/schemas/sitemap/0.9”> <url> <loc>https://www.example.com/page1</loc> <lastmod>2021-11-01</lastmod> <changefreq>weekly</changefreq> <priority>0.6</priority> </url> <url> <loc>https://www.example.com/page2</loc> <lastmod>2021-11-03</lastmod> <changefreq>weekly</changefreq> <priority>1</priority> </url> </urlset>
Now let’s look at each element.
Both <?xml> and <urlset> tags are basic XML components. They define the encoding standard and XML version.
Every <url> tag describes an individual URL. Inside, you can find the following tags:
- <loc> (required),
- <lastmod> (optional),
- <changefreq> (optional),
- <priority> (optional).
<loc> tag stands for “location,” and it contains the URL of the page.
You need to remember to specify the site protocol (HTTP or HTTPS).
If you have an international website and include hreflang tags, this is also the place to elaborate. I will cover the use of the hreflang tag below.
<lastmod> stands for “last modified,” and it includes information about the last modification.
For content sites, this tag helps Google establish that you are the original publisher – if someone scrapes your content and publishes it on their page, <lastmod> may help you remain the author of that content in Google’s eyes.
Note: You should only update this tag if you have made meaningful changes to a page. If you try to “trick” Google into thinking you update content regularly when you don’t, Google might potentially start ignoring this tag.
Make a judgment call whether the changes make a difference to a potential user. Ask yourself: would it make sense for someone to return to this page after the modifications were made? If all you did was change commas around, it’s probably not worth the risk.
<changefreq> tag stands for “change frequency.” It informs search engines how often the page is likely to change.
It can take the following values:
- always (specifies that the page is changing every time it’s accessed),
- never (should be used for archived pages).
Note: The <changefreq> tag is only a hint for search engines. Additionally, some of them, including Google, don’t take it into account at all.
The priority tag directly lets search engines know how vital a page is in relation to other URLs on your site. Assign priority on a scale between 0.0 and 1.0.
It’s worth noting that Google does not take this tag into account:
No, the priority & change frequency are not used at all by Google.
— 🧀 John 🧀 (@JohnMu) September 13, 2019
You can specify the language version of your pages with an hreflang tag.
To do so, you need to include the tag below each <url> tag to represent every language version of the page, including itself.
<xhtml:link rel="alternate" hreflang="language-code" href=”url_of_the_language_version”>
Here’s an example of a page that has English and German language versions.
<url> <loc>https://www.example.com/page1/en</loc> <xhtml:link rel="alternate" hreflang="de" href="https://example.com/page1/de"/> <xhtml:link rel="alternate" hreflang="en" href="https://www.example.com/page1/en"/> </url>
Adding the hreflang tag to your sitemap can help search engines present the most appropriate language version to the users. However, the recommended practice is adding the tag to your HTML code and in your sitemap or only in the HTML code.
While putting hreflangs in sitemap works, it also makes them a pain to verify. First, many SEO tools are optimized for hreflang tags in HTML. Second, you can forget about any browser add-ons that will automatically check hreflangs for you while visiting the page. This only works with hreflangs in HTML. If you put the markup in the sitemap, all this convenience is lost. You will have to crawl your sitemaps every time you wish to see any change made to your hreflang tags.
You can add additional syntax to your sitemap to specify information about rich media content, including:
XML Image Sitemap
You can add your images to your existing sitemap or create a separate XML Image Sitemap.
An Image Sitemap helps create an organized index of images on your website, allowing search engine bots to crawl it more efficiently. It’s beneficial if:
- Your website relies on images to drive traffic (e.g., stock photos website),
You can add image metadata and specify additional information like an image caption, location, or license. You can find more about available image tags in Google’s documentation.
The images you include in an image sitemap don’t have to be on the same domain as your website. A CDN is fine if it is verified in Google Search Console.
XML Video Sitemap
Just like Image Sitemap, you can add your videos to your existing sitemap or create a separate XML Video Sitemap.
You can provide additional information for search engine bots about your videos to help the bots find and understand your video content better, especially if the content would be difficult to discover otherwise.
For example, you can add the duration of the video and specify if it’s family-friendly. You can find more about available video tags in Google’s documentation.
Google News Sitemap
Google News Sitemap contains a list of articles published on your site and helps Google discover new articles faster.
You can list up to 1,000 URLs in the Google News Sitemap and update the articles in the sitemap as soon as they are published.
You can find the available news-specific tags in Google’s documentation.
Sitemap Index File
Sitemaps can hold 50,000 URLs. Therefore, if you want to include more URLs, you should create more than one sitemap.
If you have more than one sitemap, you can create a Sitemap Index File to submit all of your sitemaps at once. Here’s an example of a Sitemap Index File with two sitemaps:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap1.xml</loc> </sitemap> <sitemap> <loc>http://www.example.com/sitemap2.xml</loc> </sitemap> </sitemapindex>
Sitemap Index tag uses the following tags:
- XML header tag specifying the version and encoding standard,
- sitemapindex – parent tag surrounding the file (equivalent to <urlset> tag),
- sitemap – parent tag that includes each sitemap file (equivalent to <url> tag),
- loc – location tag specifying the URL of a sitemap.
How to create a sitemap?
You can create your sitemap manually or automatically.
Creating a sitemap manually
You may choose to create a sitemap manually in editors like Windows NotePad, TextEdit, or Visual Studio Code.
That way, you can customize it to your liking, but I recommend it for only small websites with fewer pages. With larger websites and hundreds of pages, this process would be very time-consuming and prone to error.
Creating a sitemap automatically
With a larger website with hundreds of pages, it’s recommended to create a sitemap automatically. It can be generated by using:
- Native features of CMS or eCommerce platforms,
- Added plugins,
- Third-party tools.
Sitemaps generated by CMS or eCommerce platforms
You can find your generated sitemap in the root directory of your website.
Sitemaps generated by plugins
If you are using a CMS like WordPress, you may need a plugin to generate a sitemap. I recommend using Yoast SEO, as this extension makes the process easy and comes with many more SEO features.
Static vs. Dynamic sitemap
A sitemap can be generated statically or dynamically.
A static sitemap is a snapshot of your website’s indexable content taken when the sitemap was generated. You can use a crawler, for example, Screaming Frog, to easily create a static sitemap.
The downside is that static sitemaps have to be updated every time a change occurs to your website. Therefore, if you add or remove pages regularly, a static XML sitemap will soon become obsolete and not serve its purpose.
A dynamic sitemap is created each time it’s requested. It means that it stays up to date and reflects the current state of your website.
Dynamic sitemaps are beneficial if your content is frequently changing. An example can be an eCommerce website where the products are going in and out of stock frequently.
To create a dynamic sitemap, you might need the help of developers or use plugins that offer this option.
Submitting the sitemap to search engines
You can ping search engines and let them know you have a new sitemap or made some changes to the previous one.
Google doesn’t check a sitemap every time a site is crawled; a sitemap is checked only the first time that we notice it, and thereafter only when you ping us to let us know that it’s changed. Alert Google about a sitemap only when it’s new or updated; don’t submit or ping unchanged sitemaps multiple times.source: Google
Submitting the sitemap to Google Search Console
Log into your Google Search Console account. Then, go to Index > Sitemaps in the sidebar.
Fill in your sitemap’s URL into the field and click ‘Submit.’
Google Search Console will let you know if there are any errors within your sitemap.
Submitting the sitemap to Bing Webmaster Tools
If you’re already verified in Google Search Console, this step is a cakewalk. Go to Bing Webmaster Tools and import your data.
If you haven’t verified in Google Search Console yet, you can navigate to the “Sitemaps” on the sidebar. And then click on the “Submit Sitemap” button on top of the page.
Since Yahoo and Bing have merged, adding a sitemap to Bing Webmaster Tools also ensures it’s submitted to Yahoo.
Submitting the sitemap to Yandex.Webmaster
To submit a sitemap to Yandex, you should:
- Go to Yandex Passport and Log in.
- Go to the “Sitemap files” section.
- Type in the address of the XML Sitemap.
- Click the “Add” button.
- Make sure your sitemap doesn’t have more than 50,000 URLs. If you have more than that, break it into smaller sitemaps,
- Include only indexable pages,
- Reference the sitemap in your robots.txt file,
- Use consistent, complete URLs – check if you’re not missing www or HTTP/HTTPS protocol from URLs,
- If you have additional media content (images, videos, news), use sitemap extensions,
- If you have different language versions, you can specify them in your sitemap, but also use the hreflang tag in your HTML,
- Don’t focus too much on changefreq and priority tags, as search engines don’t always consider them.