All about XML Sitemaps

What is an XML sitemap?

An XML sitemap is an xml file that contains all the pages of your website. At the very basics, an XML sitemap can look like this:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.siteguru.co/</loc>
</url>
<url>
<loc>https://www.siteguru.co/pricing</loc>
</url>
<url>
<loc>https://www.siteguru.co/blog</loc>
</url>
</urlset>

As you can see it is nothing but a list of URLs, structured in a standardized XML format.

The sitemap can normally be found on www.example.com/sitemap.xml, although other locations are also fine.

How can a sitemap help my SEO?

Sitemaps are an easy and efficient way of telling the search engines which pages they should index. By submitting your sitemap, the search engine knows which pages to crawl. This is especially valuable if your site has lots of new content. The sitemap is a great way for search engines to discover these pages.

But don't expect too much from this: a sitemap won’t magically help your site rank much higher. A web page that is only in your sitemap without any internal or external links pointing to it will still not rank.

The real value of sitemaps is that they help you monitor your presence in search engines. Tools like Google Search Console and Bing Webmaster can highlight pages that are in your sitemap but are not indexed. And they’ll tell you why.

Google Search Console

Google Search Console helps you as a website owner to monitor your presence in the search engine. This is also where you submit your sitemap.

Go to https://search.google.com/search-console and add your website. You’ll need to verify that you own the website. There are various ways to do that, like via Tag Manager or by adding a meta tag to your page.

Once validated, the first thing you should do is add your sitemap.xml.

Add sitemap to Google Search Console

Enter the URL of your sitemap and click Submit. Next, you’ll see your sitemap appear in the list of Submitted sitemaps:

Submitted sitemaps in Google Search Console

Google will normally need a few hours to process this data. Once that’s done, you can click the little graph icon to see the coverage of the sitemap.

Coverage tells you whether there are any issues with the pages in your sitemap. These issues may prevent the page from being indexed properly. It could be because the page doesn’t work, or whether there are no-index instructions on the page.

If all your pages work just fine, the report may look something like this:

Search console coverage report

Here's an example of a site that was just migrated, removing a lot of content during the migration:

Search Console coverage report after a migration

For every excluded page, you can see why it was excluded and whether you should fix that.

Bing Webmaster

Bing has a similar way of adding sitemaps. After adding your website to Bing Webmaster, you can see how many pages were submitted.

Although less informative compared to Google Search Console, Bing tells you which issues it encountered when crawling under Reports & Data > Crawl information.

As you can see, monitoring is the real benefit of sitemaps. It may make crawling your website slightly quicker, but the best part is that you can see whether the pages were indexed or not.

Extra attributes in your sitemap.xml

The example sitemap.xml above is very simple: it contains just the URL of the page. There are three optional attributes to include:

  • LastMod: when this page was last changed
  • Priority: how important is this page relative to the other pages, indicated by a number from 0 to 1
  • Changefreq: how often does this page change. Valid values are:
    • Always
    • Hourly
    • Daily
    • Weekly
    • Monthly
    • Yearly
    • Never

A sitemap.xml using these attributes could look like this:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.siteguru.co/</loc>
<priority>1</priority>
<changefreq>monthly</changefreq>
<lastmod>2019-10-10</lastmod>
</url>
<url>
<loc>https://www.siteguru.co/pricing</loc>
<priority>0.5</priority>
<changefreq>monthly</changefreq>
<lastmod>2019-10-20</lastmod>
</url>
<url>
<loc>https://www.siteguru.co/blog</loc>
<priority>0.8</priority>
<changefreq>weekly</changefreq>
<lastmod>2019-10-30</lastmod>
</url>
</urlset>

Don't worry about the priority and changefreq attributes: search engines ignore those. You can safely leave these out.

Google has said it mostly ignores the LastMod too, although officially Google stated that they use this date to see if the page has changed since the last crawl, and check if they should crawl that page again.

How to create a sitemap.xml?

Most content management systems have the option to build the sitemap.xml for you or have plugins available to build the sitemap.

If your website is running on Wordpress, Yoast SEO is a great tool to generate an XML Sitemap. On their website, Yoast tells you how to do that. For Drupal websites, the Simple Sitemap module lets you easily create a sitemap for all your content types.

If you’re running a Shopify webshop or a Squarespace website, your sitemap is automatically created on www.yourdomain.com/sitemap.xml.

Is your website running on a custom-built content management system? It’s probably a good idea to automatically generate the sitemap on a regular basis, to make sure all the latest content is included.

Which URLs should I include?

It’s important to include the URL of every page that you want to be indexed by the search engine. If a page is not meant to be indexed (like a login page), there is no need to include it, although it won’t hurt either.

If you have multiple URLs for a single page, make sure you include the canonical URL. For example, if you use query parameters for sorting on a product category page, these are the same URLs:

  • www.example.com/products
  • www.example.com/products?sort=price
  • www.example.com/products?sort=name

In your sitemap, you only include the canonical URL, www.example.com/products. More about canonical URLs.

How big can your sitemap be? 

Google states that your sitemap can include up to 50.000 URLs. If that’s not enough, you can create multiple sitemaps, and create a sitemap for your sitemaps. That is called a sitemap index.

Imagine you have a sitemap for all your products, one for your blog posts and one for any other pages. The sitemap index may look like this:

<!--?xml version="1.0" encoding="UTF-8"?-->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/products-sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.example.com/posts-sitemap.xml</loc>
<lastmod>2005-01-01</lastmod> </sitemap>
<sitemap>
<loc>https://www.example.com/pages-sitemap.xml</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>

Even if your site isn’t that big, it may still be useful to split up your sitemap in different sitemaps per content type. One for product pages, one for category pages, etc. This helps you analyze your website’s performance in Google Search Console by topic.

How to use sitemaps for international websites? Sitemaps can also be very helpful when you are doing SEO for international websites. For every URL on your website, you can specify the hreflang of a page. This tells Google what the alternatives for that page are in other languages. Here’s an example:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.siteguru.co/</loc>
<xhtml:link rel="alternate" hreflang="es" href="https://www.siteguru.co/es"></xhtml:link>
<xhtml:link rel="alternate" hreflang="nl" href="https://www.siteguru.co/nl"></xhtml:link>
<xhtml:link rel="alternate" hreflang="en" href="http://www.siteguru.co/"></xhtml:link>
</url>
</urlset>

Here, we specify the Spanish (ES) and Dutch (NL) versions of a website, while also including the original language English (EN). It’s required to also include the original.

The international URLs can even be on a different domain or subdomain. Here’s an example of an international webshop, where the Spanish variant is on a subdomain and the Dutch variant is on a separate domain altogether. Keep in mind this is just an example, we wouldn’t recommend such a setup.

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.webshop.com</loc>
<xhtml:link rel="alternate" hreflang="es" href="https://es.webshop.com/es"></xhtml:link>
<xhtml:link rel="alternate" hreflang="nl" href="https://www.webshop.nl/"></xhtml:link>
<xhtml:link rel="alternate" hreflang="en" href="http://www.siteguru.co/"></xhtml:link>
</url>
</urlset>

More about SEO for international websites.