Not even Google has an infinite budget. In this article, we’ll walk you through the significance of your website’s crawl budget and show you how to ensure your high-priority pages get indexed.
Let’s dive in!
Google doesn’t just crawl your page the first time you publish it. It continuously re-crawls it for updates and information changes. We call that frequency, which defines how many pages Google’s bots will crawl within a period, the crawl budget.
Most websites don’t have to worry about their crawl budget. If you only have a handful of pages, your website won’t strain Google’s resources.
However, if you run a big website (10k+ pages) or an eCommerce store with thousands of products, you’ll need to choose which pages you want Google to focus on.
Otherwise, the sheer volume of pages that Google needs to crawl will affect other pages’ ability to be indexed and, consequently, ranked. If Google wastes time on low-priority pages, it won’t be able to crawl and index your new pages, resulting in poor SEO results.
The Crawl Stats report in Google Search Console is a hidden gem. It's hidden in the Settings section, and has a lot of valuable information about how Google crawls your site.
- Log in to the Google Search Console
- Navigate to “Settings”
- Open the “Crawl Requests” report
Scroll down, and you’ll see response types that can point you to 5xx or 4xx errors. Similarly, you’ll see the purpose of the crawl. Google might crawl a page to Refresh data, or to discover new pages. For new websites, you'll see a lot of Refresh crawls, while for older sites that are already indexed, the focus shifts to refreshing the data.
But there is more interesting info: the report shows the different response types, like:
- 200 - OK
- 404 - Not found
- 301 - Moved permanently
- 302 - Moved temporarily
- 500 - Server error
This helps you spot content that doesn't work, or pages that are redirected,
Another interesting report is the File Type report, which helps you see what type of files Google is crawling (HTML, CSS, Images, etc.) and which Googlebot is being used (Smartphone, Desktop).
There are two key factors Google uses to define the crawl budget for your website:
- Crawl limit: How much crawling can your website server handle?
- Crawl demand: How crawl-worthy are the URLs? (How often are they updated, and how popular are they?)
For example, suppose Google tries to crawl specific URLs multiple times and gets 5xx errors, or your website uses shared hosting. In that case, it’ll define the crawl limit accordingly to avoid crashing your website.
Now that you understand which factors Google considers when assigning your crawl budget, it’s time to make the most of it!
Firstly, ensure you don’t waste crawl budget because of your technical setup.
If you plan to have over 10k pages, invest in fast hosting and regular maintenance, so bots don’t get server errors when trying to crawl your website.
Don’t forget about your website speed. The faster your website loads, the faster Google’s bots will be able to crawl it. If they can crawl dozens of pages in a few seconds, they’ll be able to crawl more.
The best way to waste your crawl budget is for your website to have unnecessary redirects and broken links.
In time, Google’s bots will realize how many errors they’re getting and reduce your crawl budget.
Audit your pages regularly. Using SiteGuru, you will get automated weekly audits, so you jump in only when there are problems or SEO opportunities.
Google uses links to navigate and understand the internet. While backlinks show your pages’ external relevance, internal links show your pages’ internal relevance and allow the bots to navigate from one page to the other.
Your main page A has three internal links to pages B, C, and D.
When Googlebot lands on page A, it’ll scan and crawl pages B, C, and D.
Avoid orphan pages - pages on your website without internal or external links. It’s hard for Google to find them. If you create a new page on your website, add internal links to it from other pages.
Make sure every page on your website has at least one link pointing to it. And show Google which pages you prioritize as the most important ones by pointing more internal links to them.
Ideally, all your pages should be one click away from the homepage.
However, this is hard to implement, so the SEO best practice to maximize your crawl budget is to use a flat website structure. A flat site structure means all pages can be quickly found and do not require a lot of clicks.
With a flat website structure, your link authority flows evenly - from your most important page to category pages and branching out.
If your sitemap is full of unavailable or redirected pages, update it. Since Google uses your XML sitemap as a reference point for pages it needs to crawl, make sure every URL in the sitemap is worth indexing.
Check your sitemap in the Google Search Console:
- Navigate to “Sitemaps”
- Select the submitted sitemap
- Select “See Index Coverage”
GSC will show you if some pages haven’t been indexed.
You can also manually check and remove:
- Unavailable pages
- Pages with 404 errors
- Pages that are set to noindex
- Incorrect URLs
- Redirected URLs
If you’re strapped for time, get automated SEO audits through SiteGuru. You’ll be notified whenever there’s an SEO problem or opportunity.
If your website is new or you’ve previously had problems with orphan pages, redirect chains, and other errors, it may take time for Google to increase your crawl budget.
But once you’ve applied all the advice in this article, your next step will be to increase your page authority. The more backlinks pointing to your website, the more importance Google will give it.
- Improve your website speed and consider dedicated hosting
- Remove unnecessary redirects
- Remove broken links
- Update your sitemap to exclude broken URLs
- Set up your internal links considering the flat website architecture principles
- Keep an eye on your crawl budget in Google Search Console -> Settings -> Crawl Report, or get notifications using SiteGuru