XML Sitemaps

(Last updated: April 2020)

What is an XML sitemap?

An XML sitemap is an xml file that contains all the pages of your website. If generated properly, it can act as a roadmap of your website that helps search engines like Google, Bing to all your important pages, marked up with tags that identify types of data.

How does a Sitemap look like?

At the very basics, an XML sitemap can look like this:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.siteguru.co/</loc>
</url>
<url>
<loc>https://www.siteguru.co/pricing</loc>
</url>
<url>
<loc>https://www.siteguru.co/blog</loc>
</url>
</urlset>

Do you  need a sitemap.xml for your website? 

No, technically, you don't. Just like every SEO tasks you perform, it's not a requirement, but Google has said XML are still a very useful discovery method for them to pick out recently updated content on your site.

“All formats limit a single sitemap to 50MB (uncompressed) and 50,000 URLs. If you have a larger file or more URLs, you will have to break your list into multiple sitemaps.” Google Webmaster Guidelines 2020

Google can still crawl your site, SEO is all about assisting search engine bots, pave the way and help them understand and make it better for them. 

What is the use of sitemap XML?

As mentioned from Sitemaps.org above, It allows webmasters to include additional information about each URL: 

  • When it was published

  • when it was last updated,

  • how often it changes,

  • the importance in its relation to other URLs of the site.

As you can see it is not something to neglect and it's not just a list of URLs.But with most modern CMS auto-generate XML sitemaps, so there's no need to maintain or seen as it has to be updated every single time

 However,  Google does ask you submit a site-map in Google Search Console, and you should submit it every time you create a new post, page, or update your sites content. 

It is recomended to define your sitemap.xml manually, all the important pages by links and depth of content.

How can a sitemap help my SEO?

Sitemaps are an easy and efficient way of telling the search engines which pages they should index. By submitting your sitemap, the search engine knows which pages to crawl. This is especially valuable if your site has lots of new content. The sitemap is a great way for search engines to discover these pages.

But don't expect too much from this: a sitemap won’t magically help your site rank much higher. A web page that is only in your sitemap without any internal or external links pointing to it will still not rank.

The real value of sitemaps is that they help you monitor your presence in search engines. Tools like Google Search Console and Bing Webmaster can highlight pages that are in your sitemap but are not indexed. And they’ll tell you why.

How to find your sitemap

The sitemap can normally be found on www.example.com/sitemap.xml, although other locations are also fine.

If you are using Shopify, WIX and Squarespace, both automatically generate XML sitemaps. However, other CMS and customer websites might be harder to find. There are several places to find your sitemap, these are:

  • Type in your domain name e.g. https://www.yourwebsite.com with the following endings

/sitemap

/sitemap.xml

/sitemap_index.xml

  • If you're using a CMS like WordPress, you might be using an SEO plugin like Yoast or Rank Math. They will most likely provide you with the link in their sitemap section.

  • A quick and simple one to try is doing an SEO audit: You can use our tool by just putting your URL in and let them see if they can find your sitemap:

sitemap detected audit tool checker

Google Search Console

Google Search Console helps you as a website owner to monitor your presence in the search engine. This is also where you submit your sitemap.

How to add a sitemap to your Google Search Console:

Go to https://search.google.com/search-console and add your website. You’ll need to verify that you own the website. There are various ways to do that, like via Tag Manager or by adding a meta tag to your page.

Once validated, the first thing you should do is add your sitemap.xml.

Add sitemap to Google Search Console

Enter the URL of your sitemap and click Submit. Next, you’ll see your sitemap appear in the list of Submitted sitemaps:

Submitted sitemaps in Google Search Console

How to verify your sitemap is active?

Google will normally need a few hours to process this data. Once that’s done, you can see that the sitemap has been detected and no error has been found. 

You can click the little graph icon to see the coverage of the sitemap.

Coverage tells you whether there are any issues with the pages in your sitemap. These issues may prevent the page from being indexed properly. It could be because the page doesn’t work, or whether there are no-index instructions on the page.

If all your pages work just fine, the report may look something like this:

Search console coverage report

Here's an example of a site that was just migrated, removing a lot of content during the migration:

Search Console coverage report after a migration

For every excluded page, you can see why it was excluded and whether you should fix that.

Bing Webmaster

Bing has a similar way of adding sitemaps. After adding your website to Bing Webmaster, you can see how many pages were submitted.

Although less informative compared to Google Search Console, Bing tells you which issues it encountered when crawling under Reports & Data > Crawl information.

As you can see, monitoring is the real benefit of sitemaps. It may make crawling your website slightly quicker, but the best part is that you can see whether the pages were indexed or not.

Extra attributes in your sitemap.xml

The example sitemap.xml above is very simple: it contains just the URL of the page. There are three optional attributes to include:

  • LastMod: when this page was last changed
  • Priority: how important is this page relative to the other pages, indicated by a number from 0 to 1
  • Changefreq: how often does this page change. Valid values are:
    • Always
    • Hourly
    • Daily
    • Weekly
    • Monthly
    • Yearly
    • Never

A sitemap.xml using these attributes could look like this:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.siteguru.co/</loc>
<priority>1</priority>
<changefreq>monthly</changefreq>
<lastmod>2019-10-10</lastmod>
</url>
<url>
<loc>https://www.siteguru.co/pricing</loc>
<priority>0.5</priority>
<changefreq>monthly</changefreq>
<lastmod>2019-10-20</lastmod>
</url>
<url>
<loc>https://www.siteguru.co/blog</loc>
<priority>0.8</priority>
<changefreq>weekly</changefreq>
<lastmod>2019-10-30</lastmod>
</url>
</urlset>

Don't worry about the priority and changefreq attributes: search engines ignore those. You can safely leave these out.

Google has said it mostly ignores the LastMod too, although officially Google stated that they use this date to see if the page has changed since the last crawl, and check if they should crawl that page again.

How to create a sitemap.xml?

Most content management systems have the option to build the sitemap.xml for you or have plugins available to build the sitemap.

If your website is running on Wordpress, Yoast SEO is a great tool to generate an XML Sitemap. On their website, Yoast tells you how to do that. For Drupal websites, the Simple Sitemap module lets you easily create a sitemap for all your content types.

If you’re running a Shopify webshop or a Squarespace website, your sitemap is automatically created on www.yourdomain.com/sitemap.xml.

Is your website running on a custom-built content management system? It’s probably a good idea to automatically generate the sitemap on a regular basis, to make sure all the latest content is included.

Which URLs should I include?

It’s important to include the URL of every page that you want to be indexed by the search engine. If a page is not meant to be indexed (like a login page), there is no need to include it, although it won’t hurt either.

If you have multiple URLs for a single page, make sure you include the canonical URL. For example, if you use query parameters for sorting on a product category page, these are the same URLs:

  • www.example.com/products
  • www.example.com/products?sort=price
  • www.example.com/products?sort=name

In your sitemap, you only include the canonical URL, www.example.com/products. More about canonical URLs.

How big can your sitemap be? 

Google states that your sitemap can include up to 50.000 URLs. If that’s not enough, you can create multiple sitemaps, and create a sitemap for your sitemaps. That is called a sitemap index.

Imagine you have a sitemap for all your products, one for your blog posts and one for any other pages. The sitemap index may look like this:

<!--?xml version="1.0" encoding="UTF-8"?-->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/products-sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.example.com/posts-sitemap.xml</loc>
<lastmod>2005-01-01</lastmod> </sitemap>
<sitemap>
<loc>https://www.example.com/pages-sitemap.xml</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>

Even if your site isn’t that big, it may still be useful to split up your sitemap in different sitemaps per content type. One for product pages, one for category pages, etc. This helps you analyze your website’s performance in Google Search Console by topic.

How to use sitemaps for international websites? Sitemaps can also be very helpful when you are doing SEO for international websites. For every URL on your website, you can specify the hreflang of a page. This tells Google what the alternatives for that page are in other languages. Here’s an example:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.siteguru.co/</loc>
<xhtml:link rel="alternate" hreflang="es" href="https://www.siteguru.co/es"></xhtml:link>
<xhtml:link rel="alternate" hreflang="nl" href="https://www.siteguru.co/nl"></xhtml:link>
<xhtml:link rel="alternate" hreflang="en" href="http://www.siteguru.co/"></xhtml:link>
</url>
</urlset>

Here, we specify the Spanish (ES) and Dutch (NL) versions of a website, while also including the original language English (EN). It’s required to also include the original.

The international URLs can even be on a different domain or subdomain. Here’s an example of an international webshop, where the Spanish variant is on a subdomain and the Dutch variant is on a separate domain altogether. Keep in mind this is just an example, we wouldn’t recommend such a setup.

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.webshop.com</loc>
<xhtml:link rel="alternate" hreflang="es" href="https://es.webshop.com/es"></xhtml:link>
<xhtml:link rel="alternate" hreflang="nl" href="https://www.webshop.nl/"></xhtml:link>
<xhtml:link rel="alternate" hreflang="en" href="http://www.siteguru.co/"></xhtml:link>
</url>
</urlset>

More about SEO for international websites.