Solving crawl issues

Contents

Crawling your website

While doing a website audit, our crawler makes multiple requests to your website. We'll open your sitemap, crawl pages for more links, and run an SEO check on every page.

Although we try to keep the number of requests as low as possible, we inevitably need to open a lot of pages on your site. For large websites, this can result in thousands of requests. To make sure this doesn't affect your website, we're allowing enough time between those requests. Whenever it seems your server is too busy, we'll slow down the number of requests.

Sometimes though, a server, CDN (Content Delivery Network) or Firewall may block our requests. This can cause issues like:

  • Not seeing any pages in the report
  • Seeing a lot of error pages
  • Seeing a lot of broken links

I'm not seeing any pages

If you're not seeing any pages in your site report, it most likely means our crawler could not access your website at all. As a result, we can't open sitemaps or follow links. Therefore, you won't see any pages in the report. Please check with your technical team or hosting provider if they're blocking crawlers from accessing the site. If so, ask them to whitelist our user agent.

I'm seeing a lot of error pages

If some pages work, but others don't, it could be that we're crawling your site to quickly. This is exceptional, because our server will normally slow down the requests as soon as your website starts giving us errors. Normally, a new crawl will fix this. If not, please contact us.

I'm seeing a lot of broken links

Our link checker checks all internal and external links on your site. If you're seeing a lot of internal broken links, please run a new crawl. Just like our crawler, the link checker will slow down the number of requests. Normally this leads to better results.

For external broken links, there isn't much you can do. Sometimes external sites block our requests. Normally, we're familiar with these limitations and the link will not be flagged as broken. If you see an external link that works, you can click the Ignore button. That means the link will not show up in the report anymore.

Ignoring a broken link

Solution: whitelisting SiteGuru's User-Agent

To solve this, you can ask your technical team or hosting provider to whitelist any requests from user agent SiteGuruCrawler. All our requests are made using that user agent, so adding that to the white-list should give us access to your site.

Blocked by robots.txt settings

SiteGuru follows restrictions in your robots.txt file. If we're not allowed to crawl the site because of robots.txt instructions, we'll show this as a warning message in the site report. 

If this is the case, please add the following to your robots.txt file:

User-agent: SiteGuruCrawler
Allow: /

Can I get your IP address so it can be whitelisted?

Unfortunately that is not possible. We use many different servers, each having their own IP address. These servers are started and stopped if needed, so we don't know the IP addresses ourselves.

If you still have issues getting your website crawled, please contact us.