Indexed, Though Blocked by robots.txt: Error Guide

Learn how to fix all the problems that could cause the “Indexed, though blocked by robots.txt” status in Google Search Console in our detailed guide. Plus, make sure Google doesn’t index your private pages!

If you received a Google Search Console notification or noticed that some of your pages are “Indexed, though blocked by robots.txt,” I’m here to show you how to solve this common indexing error, plus – what to do when pages that shouldn’t be indexed get indexed.

Let’s take a look!

What Is the “Indexed, Though Blocked by robots.txt” Indexing Status?

When Google bots are done crawling your website, they’ll index it next. Usually, that’s the goal: you want your pages to rank for the right keywords on Google SERPs.

However, there are some pages you don’t want it to index, for example:

  • Your website backend
  • Staging environments
  • Private pages
  • Thin or duplicate content pages
  • And more.

If you’ve received an email from Google Search Console (GSC) that says, “Indexed, though blocked by robots.txt,” then here is a little help with what is happening and how to fix it.

Indexed, though blocked by robots.txt message in Google Search Console (GSC)

Source

To make sure Google doesn’t use these pages, you use the robots.txt file. It contains instructions for the search engines, including specifying the pages you’d like it to skip.

Why Would Google Index URLs Blocked by robots.txt?

Google might index URLs even if they're blocked by robots.txt for a few reasons:

External Links: If other websites link to a blocked page, search engines might think it's important and choose to index it, even though they can’t view its content.

User Interest: If people are really interested in a blocked page, search engines might decide to index it based on that popularity.

Sitemap Inclusion: If the URL is listed in your website's sitemap, it signals that the page is meant to be discovered, which could lead to its indexing.

Parent Page Context: Sometimes, if a parent page links to the blocked URL, search engines might notice this context and decide to index it.

Where Can I Check Indexation Issues?

If you haven’t received a notification but would still like to check, you can use Google Search Console -> Indexing -> Pages. Here’s a bit more detail on how to get there:

  • Start by logging into your Google Search Console account.
  • On the left-hand sidebar, find and click on "Coverage." This will take you to a report with different indexing issues.
  • Look for an error labeled "Indexed though blocked by robots.txt." Click on this to see a list of URLs being indexed even though your robots.txt file blocks them.
  • From the list, select a URL that you want to check more closely. Then, click on “Inspect URL” to open the URL Inspection tool.
  • In the URL Inspection tool, check out the 'Crawl' section. You’ll see details like "Crawl allowed: No: blocked by robots.txt" and "Page fetch: Failed: Blocked by robots.txt."
  • Review this information to understand why the URL might be indexed despite the robots.txt restrictions.

If you use SiteGuru for weekly SEO audits and to-do lists, you can use the indexation report. You’ll see when Google bots last crawled your pages and if you should fix any indexing issues.

A Few Tips to Simplify Your Work

  • Use the filters in the Coverage report to narrow down the URLs and focus on the specific issue.
  • Cross-check the Robots.txt file to ensure that it's correctly configured for the web pages you want to be indexed.

This way, you can quickly identify which URLs are impacted by the "robots.txt issue and make necessary adjustments to your robots.txt file or other settings to resolve the problem.

How to Fix “Indexed, Though Blocked by robots.txt”

Step 1: Did You Intentionally No-Index the Page?

It is not a problem if the robots.txt file contains directives by you or a developer to block pages, but check to ensure:

  • You’re not blocking pages that should be ranking for a keyword.
  • You haven’t accidentally set up a general rule that affects pages that should be indexed.

If you intentionally no-indexed the page, then you’re good! Feel free to skip this article and brew yourself a cup of coffee.

If you haven’t intentionally no-indexed the page, it’s time to troubleshoot.

Step 2: Troubleshoot Unintentional robots.txt Blocking

Robots.txt Rules

There may be a directive in your robots.txt file preventing the indexing of pages that should be indexed. For example, you may have blocked certain pages in your help center from being indexed, but you may have set up a rule that blocks all of them—including the ones that could rank for a long-tail keyword. Check the no-index directives and ensure that:

  • There is no more than one ‘user-agent’ block.
  • The ‘disallow’ line doesn’t instantly follow the ‘user-agent’ line.
  • Invisible Unicode characters are removed. (You can run your robots.txt file through a text editor, which will convert the encodings.)

You can also use our free no-index checker to verify.

If you want search bots to index all the pages on your website, this should be your robots.txt directive:

  • User-agent: * Disallow: / (This means disallowing nothing.)

Identify Specific Blocking Lines: If you're unsure which part of your robots.txt is causing specific URLs to be blocked, follow these steps:

  • Select a URL you want to troubleshoot.
  • Next, use the TEST ROBOTS.TXT BLOCKING tool, which you’ll usually find in the right-hand pane of your SEO tool or search console.
  • Analyze the results in the new window that opens. It will show you the exact line in your robots.txt file blocking Google from accessing the URL.
  • Export the list of URLs from Google Search Console and sort them alphabetically.
  • Then, review the URLs to see if any you want indexed are included. If so, you'll need to update your robots.txt file to allow Google to access these URLs.
  • Update your robots.txt file by removing or modifying any ‘disallow’ lines blocking important pages. Make sure these changes align with your overall SEO strategy and site architecture.

Duplicate robots.txt Files

Using a CMS like WordPress may automatically create your robots.txt file, and SEO plugins do the same. Ensure you’re not duplicating or triplicating robots.txt files with different directives, which can complicate Google's understanding of your site’s indexing rules. To edit the robots.txt file using different SEO plugins, follow these steps:

WordPress + Yoast SEO:

  • Log into your wp-admin section. In the sidebar, go to Yoast SEO plugin > Tools.
  • Click on File Editor. Edit the robots.txt file as needed and then click Save Changes.

WordPress + Rank Math:

  • Log into your wp-admin section. In the sidebar, go to Rank Math > General Settings.
  • Click on Edit robots.txt. Make your edits and then click Save Changes.

WordPress + All-in-One SEO:

  • Log into your wp-admin section. In the sidebar, go to All in One SEO > Robots.txt.
  • Edit the file as required and then click Save Changes.

Redirect Chains

Bots use links to crawl and understand your website, but if you’ve set up too many redirects, they can overwhelm you. I always recommend creating redirect maps, and minimizing redirects can help maintain an efficient crawl budget.

Canonical Tags

If you have duplicate content, use canonical tags to indicate which version Google should index and rank. Ensure these tags are set up correctly to avoid inadvertently blocking essential pages. For example, let’s say you run an international website. You’ll have an original version of the page in Spanish and a translated version in English. In that case, you’ll add a canonical tag to the duplicate page so it references the original version.

URL Formats 

Sometimes, Google might pick up a campaign UTM parameter or an unexpected URL variation. Verify whether it’s a legitimate page. If it is, adjust the URL. Next, it’s time to double-check everything and validate the fix in Google Search Console.

Step 3: Validate the Fix

After fixing the URLs, first do a quick check to make sure everything’s in order:

  • Use Google Search Console or a similar tool to test your robots.txt file.
  • Double-check for any syntax errors that might have slipped in.
  • Confirm that the intended pages are now accessible to search engines.

Once everything looks good, navigate to the Page Indexing section in Google Search Console, select the URL, and click “Validate fix.”

Troubleshooting No-Indexed Pages that Get Indexed

There are also cases where the pages you don’t want Google to pick up are indexed. In addition to checking the robots.txt rules for mistakes, check for the following culprits:

Are Other Sites Linking to Your Pages?

Pages linked to other sites can be indexed even if disallowed in robots.txt. When this happens, only the anchor text and URL are displayed in search engine results.

You can fix this issue by:

  • Password-protecting the file(s) on your server.
  • Adding an instruction to the robots.txt file to ignore these pages or adding the following meta tag to block them: <meta name="robots" content="noindex">

Website Migration

If you recently migrated your website and no-indexed the old URLs, it will take a while for Google to catch on.

You can fix this issue by:

  • Implementing 301 redirects from old to new URLs (preferable for conserving link equity).
  • Giving Google time to drop the old URLs from its index eventually. (Typically, Google drops URLs if they keep returning 404 errors.) Avoid plugins to redirect your 404s.

How to Identify Pages You Should No-Index

Step 1: Make a List of Your URLs

Make a list of all your website URLs. You can do this manually or use SiteGuru's crawler for a more thorough approach.

Step 2: Identify URLs You Don’t Want on the SERP

Once you’ve identified the URLs that you don’t want Google to index, add them to your robots.txt file:

User-agent: *
Disallow: /page-you-want-to-disallow/
Disallow: /more-page-you-want-to-disallow/
Disallow: /another-page-you-want-to-disallow/

Step 3: Remove Links to the No-Indexed Pages

Check which pages might have linked to the disallowed pages and remove the link. Google Search Console does not provide this information, but you can use SiteGuru to see the linking URLs.

Step 4: Double-Check

Finally, run a new website audit with SiteGuru to ensure the pages cannot be indexed and others can. You should see a “no-index” tag following the page.

How to Fix Robots.txt Blocking Content on an Unlaunched WordPress Site

If your WordPress site hasn't gone live yet and you find your robots.txt file is blocking all content, it usually looks like this:

User-agent: *
Disallow: /

This setup tells web crawlers not to index any pages on your site. To fix this, just follow these steps:

  • Go to your WordPress dashboard.
  • In the left-hand menu, select Settings and then click on Reading.
  • Check your search engine visibility settings.
  • Find the section called Search Engine Visibility.
  • Ensure the box labeled “Discourage search engines from indexing this site” is unchecked. If it’s checked, WordPress will block all search engines from crawling your site using a virtual robots.txt file.

Keep an Eye on Your Coverage

coverage issues detected

Source

It’s normal to see different status codes in your Google Search Console, but know when to act.

Regarding the “Indexed, though blocked by robots.txt” code, keep your robots.txt file updated with the proper exceptions. 

Then, monitor the changes manually or through SiteGuru’s automated weekly audits. This is the easiest way to focus on actionable SEO and delve into the technicalities only when something requires your attention.

FAQ

Can I disallow crawling for my entire website?

Yes, you can. However, URLs may still be indexed in some situations, even if they haven’t been crawled.

This doesn’t match the various AdsBot crawlers that must be named explicitly, so you can block your website for search engines but still show ads.

How do I disallow directory crawling?

Disallow the crawling of a directory and its contents by following the directory name with a forward slash:

User-agent: * 
Disallow: /tags/

The example above would disallow any pages following the path /tags/. For example, if “tags” is your category page, this directive blocks all subsequent coffee product pages.

Please remember that using proper authentication to block access to private content is better than using robots.txt. Anyone can view the robots.txt file, so URLs might still be indexed without being crawled.

How do I edit my Shopify and eCommerce robots.txt files?

Even though you previously couldn’t edit your Shopify robots.txt file, you now can.

Go to Online Store -> Themes -> Actions -> Edit code -> Add a new template -> Select “robots” -> Select “Create template.” There, you’ll be able to make your exceptions and rules.