XML Sitemaps

What is a sitemap?

A sitemap is an XML file that contains all the pages, images, and videos that you want search engines to index. Sitemaps help search engines like Google and Bing to crawl the site much faster. That makes a sitemap a valuable SEO tool.

Apart from the XML sitemap, you can also create an HTML sitemap for your visitors. For SEO, then XML sitemap is more relevant, and we focus on the XML sitemap in this article

How does a sitemap help my SEO?

Let's start with a quick explanation of how search engines work. Google, Bing, and other search engines want to crawl and index as many pages of your site as possible, and do this as efficiently as possible. Normally they do this by following all the links on your site, as well as following links from other pages to yours. As you'll understand, this can be a complex and time-consuming process. It also needs every page to be linked to.

A sitemap helps to speed up this process. You give search engines a list of all the pages you want to have indexed. That makes crawling a lot more efficient. A website with a sitemap is generally crawled much faster and much more efficiently, compared to sites without a sitemap. Therefore, always make sure your website has a sitemap on it.

What does a sitemap look like?

A sitemap is an XML file with all relevant URLs. It can look like this:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>https://www.siteguru.co/</loc>
    <lastmod>2022-05-15</lastmod>
  </url>
  <url>
    <loc>https://www.siteguru.co/pricing</loc>
    <lastmod>2022-03-18</lastmod>
  </url>
...
  <url>
    <loc>https://www.siteguru.co/blog</loc>
    <lastmod>2022-02-17</lastmod>
  </url>
</urlset>

The sitemap contains a <url> element for every relevant page. Within it are the following tags:

  • loc: the unique URL of the page. Use the absolute canonical URL of the page, including https:// and your domain. the URL should not be redirected.
  • lastmod: the date at which the page was last modified. This helps Google determine which page should be crawled again. It's an optional tag, and Google has said that it doesn't place a lot of value on this date.

Besides these two, you could also add the changefreq (how often is the page updated) and priority (how important is the page relative to the other pages). Google doesn't use these, so you might as well ignore them.

Is a sitemap required?

No, you could technically get your website rankled without a sitemap. Especially if your internal linking is perfect, Google will likely find all your pages.

That being said, we do strongly recommend adding a sitemap. That way you can be sure Google can find all the pages on your site. Especially for bigger websites, a sitemap is an absolute must. Also, newer websites can benefit greatly from having a sitemap because they generally don't have that many backlinks pointing to them. 

As you'll see, sitemaps also help you check if every page is accessible and indexed by a search engine. More on that later. 

Having a URL in a sitemap does not guarantee that search engines will index it. The search engine can still decide that the page isn't relevant enough to index.

Where do I place my sitemap?

Place the sitemap in the root of your website, so on www.domain.com/sitemap.xml. Other filenames also work. For instance, Yoast uses sitemap_index.xml. The next step is to tell search engine crawlers where to find your sitemap. You do this in the robots.txt file. This file is also located at the root of your site, so on www.domain.com/robots.txt. Next to other crawl instructions, you can specify the location of your sitemap, like this:

Sitemap: https://www.domain.com/sitemap.xml

Now the search engine knows where to find your sitemap.

Submitting your sitemap to search engines

As a website owner, you can submit your sitemap to Google Search Console and Bing Webmaster Tools. This helps faster indexing, and it gives you insights into which pages are indexed and which aren't - including the reason why.

Submit a sitemap to Google Search Console

If you haven't submitted your site to Google Search Console yet, first go to https://search.google.com/search-console. Here, you can add your site and claim ownership of it. You can do this via Google Tag Manager, a metatag, a file on your site, or a DNS record.

After you've been validated as the owner, you can access the property. There are a lot of features, but we're focusing on the sitemaps now. Go to the Sitemaps menu item and add the URL of your sitemap,


Enter the URL and click Submit. Next, you'll see the sitemap in the Submitted sitemaps list.

How to check if a sitemap is crawled?

Google normally needs a few hours to process your sitemap. Once that's done, and everything went OK, you'll see the status change to Successful. If Google encountered an issue while crawling your sitemap, you'll see it here.

Click the little graph icon to view the Coverage Report. This report shows you which pages have been indexed, and which haven't. If not, it will also show you why it hasn't been indexed yet. This could be because the page has a noindex tag, because it's redirected, or for some other reason.

If all goes well, the report will look something like this:

Search Console Coverage Report

For any pages that are excluded, you can click on the details to see what the issue is. Some potential reasons are:

  • Crawled, currently not indexed: Google has seen the page but hasn't indexed it yet. You'll often see this for new content that is still waiting to be indexed.
  • Duplicate, submitted URL not selected as canonical: the page has a different canonical URL that is indexed instead
  • Page redirected: the URL from the sitemap is redirected. It's recommended to use the URL it is redirecting to in your sitemap instead.

Sitemaps and Bing Webmaster

Let's not forget about Bing. Microsoft's search engine has a similar way of adding sitemaps. After verifying your site in Bing Webmaster, you can add the sitemap. And just like with Google Search Console, it shows you how many pages are indexed:

Bing Sitemap report

If there are any issues with the sitemap, you'll also see it here.

How to create a sitemap?

If you're using a CMS like Wordpress, Drupal, Wix, or Shopify, setting up a sitemap is very easy.

Sitemaps in Wordpress

Yoast is a super useful SEO plugin for Wordpress. It also helps you build a sitemap. Yoast creates a sitemap_index.xml file, with references to sub-sitemaps, like one for posts (post-sitemap.xml), pages (page_sitemap.xml), and more.

Sitemaps in Drupal

Are you using Drupal? The Simple Sitemap module helps you build sitemaps for all your content types and pages.

Sitemaps in Wix

Just like Yoast, Wix creates a sitemap_index.xml, with different sitemaps per content type in it, such as pages, categories, or events.

Sitemaps in Shopify

Are you using Shopify for your webshop? Shopify automatically creates a sitemap.xml with all products, pages, images, and other content.

Does your website run on a custom-built CMS? We recommend having a sitemap with all relevant content that automatically updates. That way you can be sure Google can quickly access all your content.

Which URLs should go in my sitemap?

All pages that you want search engines to index should be in your sitemap. 

Always use the canonical URL of every page. Imagine you have a category page with different options for sorting. That can result in the following URLs:

  • www.example.com/products
  • www.example.com/products?sort=price
  • www.example.com/products?sort=name

Only include the canonical URL: www.example.com/products.

Don't include any pages that are set to noindex, or pages that can't be crawled because of instructions in the robots.txt file. Pages that you don't want to be indexed should not be in the sitemap.

How large can a sitemap be?

Google states that a sitemap can contain up to 50,000 URLs, and that it should be no larger than 50 MB. If that is not enough for all your content, you can split up the sitemap into separate, smaller sitemaps. The main file that contains references to all the sub-sitemaps is called the sitemap index

It can be useful to split up the sitemap by content type. Imagine you have a webshop with a lot of products, pages and a blog. You could create a separate product sitemap, pages sitemap and blog sitemap, and include all three in the sitemap index:

<!--?xml version="1.0" encoding="UTF-8"?-->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.example.com/products-sitemap.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/posts-sitemap.xml</loc>
    <lastmod>2005-01-01</lastmod> </sitemap>
  <sitemap>
    <loc>https://www.example.com/pages-sitemap.xml</loc>
    <lastmod>2005-01-01</lastmod>
  </sitemap>
</sitemapindex>

This is not only useful for large sites. Even if you don't exceed the maximum size, having different sitemaps per type allows you to easily see the coverage per content type in Google Search Console. Many Content Management Systems like Wix and WordPress do this for you.

Sitemaps and international websites

Sitemaps can also be very useful if you're running an international website. The hreflang tag is used to specify different language variants of a page. If we'd have a Dutch and Spanish website, we would add the following hreflangs to the head of our page:

<link rel="alternate" hreflang="es" href="https://www.siteguru.co/es">
<link rel="alternate" hreflang="nl" href="https://www.siteguru.co/nl">
<link rel="alternate" hreflang="en" href="http://www.siteguru.co/">

If you don't have access to the source code, and developers can't help you do set up the hreflangs quicky, sitemaps are a great alternative. You can specify hreflangs in the sitemap, like this:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>https://www.siteguru.co/</loc>
    <xhtml:link rel="alternate" hreflang="es" href="https://www.siteguru.co/es"></xhtml:link>
    <xhtml:link rel="alternate" hreflang="nl" href="https://www.siteguru.co/nl"></xhtml:link>
    <xhtml:link rel="alternate" hreflang="en" href="http://www.siteguru.co/"></xhtml:link>
  </url>
</urlset>

Here we specify which language variants there are for a page. Google uses this information to show the right page for the right language. That could also be on a subdomain, or even on a different domain:

Imagine you do SEO for a webshop. The Spanish site is on a subdomain, and the Dutch site is on a separate domain. Your hreflangs could look like this:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>https://www.webshop.com</loc>
    <xhtml:link rel="alternate" hreflang="es" href="https://es.webshop.com/es"></xhtml:link>
    <xhtml:link rel="alternate" hreflang="nl" href="https://www.webshop.nl/"></xhtml:link>
    <xhtml:link rel="alternate" hreflang="en" href="http://www.siteguru.co/"></xhtml:link>
</url>
</urlset>

More about SEO for international websites.

Sitemaps for images and video

So far we've mostly talked about pages. But images and videos can also go in a sitemap. Getting Google to index all your images quickly can help you rank better in Google Image Search. There are two ways to add images to your sitemap:

One is to specify all the images on a page within the <url> tag of that page, like this:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>https://www.siteguru.co/</loc>
    <lastmod>2022-05-15</lastmod>
    <image:image>
      <image:loc>https://www.siteguru.co/logo.png</image:loc>
      <image:title>SiteGuru Logo</image:title>
    <image:image>
    <image:image>
      <image:loc>https://www.siteguru.co/team.jpeg</image:loc>
      <image:title>Our team</image:title>
    <image:image>
  </url>
</urlset>

Or you can create a separate sitemap that has all your images in one file:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <image:image>
    <image:loc>https://www.siteguru.co/logo.png</image:loc>
    <image:title>SiteGuru Logo</image:title>
  <image:image>
  <image:image>
    <image:loc>https://www.siteguru.co/team.jpeg</image:loc>
    <image:title>Our team</image:title>
  <image:image>  
</urlset>

SiteGuru's Sitemap Report

To be sure your sitemap is complete and correct, you can use SiteGuru's Sitemap Report. It checks your sitemap for:

  • Missing pages
  • Pages in your sitemap that do not work
  • Pages in your sitemap that are redirected.

SiteGuru's SEO report

You can even download the full sitemap for your website

Conclusion

Sitemaps help search engines to index your pages more efficiently. By adding a sitemap, your pages will be indexed more quickly, and it ensures search engines don't miss any pages.

The sitemap coverage report helps you find issues, and see the status of your indexation.