Search engines and language detection

We've discussed how you structure your site to target different countries, and how you can use hreflang tags to make sure the right language variant of a page shows up for the right language. But how does Google know which language your page is written in? And how does it target the right audience? We'll discuss that in this article.

In what language do searchers get results?

Search results are only helpful if you understand the language. That's why every search engine favors content written in the right language for the user.

Google

Google lets users specify their preferred languages, and will tailor the results based on that setting. To change this, go to Search Preferences > Languages. Here you can change the language of the Google user interface, and you can change the languages you want to have included in the search results. Here, I selected English and Dutch:

Google search language settings

If you don't explicitly set the language, Google will serve results based on your region or your account settings.

Setting your preferred language in Google doesn't mean you will not see results in any other languages. If Google for some reason thinks it found a relevant page in a different language, it might still include that in the search results.

Bing

Similarly, Bing lets you select one or more languages that you want to include in the search results. By going to your settings, you can choose which languages you want to include.Bing account language settings

How do search engines know the language of a page?

Search engines like Google aim to provide the most relevant search results to their users. And to be relevant, the user has to understand it. The content needs to be written in a language the user understands, otherwise Google will not show it.

When a user has set their preferred language, how does Google decide whether to include a certain site in the results? Google uses a combination of factors to determine what the location, language, and locale of a site are. By applying the right settings, you can make sure your site is included for the relevant language and country.

Google Search Console's International Targeting report

If you've verified ownership of your site in Google Search Console, you can specify which country your website targets. When opening the report, you'll see a (legacy report) overview of the hreflang tags and potential issues. By clicking on the Country tab, you can select a country from the dropdown list.

Google Search Console country target settings

If you're using a country-specific domain extension, like .co.uk for the United Kingdom, .de for Germany, or .nl for The Netherlands, you don't need to specify the country you're targetting. Also, when you're targetting users from around the world, there is no need to specify the target users.

Keep in mind that this setting is only about the location of the user, not the language. If you're targetting all English-speaking users, no matter if they're from Australia, Ireland, or the US, don't set the target users field.

Domain extensions

Country-specific top-level domain names help Google determine which country a site targets. Some examples include:

  • .nl for The Netherlands
  • .com.au for Australia
  • .ie for Ireland
  • .gov.uk for UK governmental organizations

The top-level domain name isn't always helpful. We're guilty of that: although our TLD is .co, we're not based in Colombia. Similarly, often-used domain extensions include .io (British Indian Ocean Territory), .ly (Lybia).  Therefore, Google considers them to be generic.

Other domain names like .com, .eu, .info are also considered Generic top-level domains, not belonging to any country. With the availability of more TLDs (like .dev, .travel, .agency and .shop), the domain often provides little value in determining the location.

Hreflang tags

Hreflang tags help site owners specify the alternative language variant of a page, as well as the language of the page itself. This can be done in sitemaps, or in the head of the page. Google uses these as an important signal of the language and locale of the page.

Language recognition

Google's crawler is pretty smart, and it has some magic to help it identify which language a page is written in. It takes these cues not from tags or headers, but from the actual content of the page. In most cases, Google is able to determine in which language the page is written.

Other signals

Additionally, Google may use the IP address of the server to determine the location, as well as specific data on the page or in the Google Business Profile to find out the language and location.

What about the lang attribute?

The HTML5 specification includes a lang attribute, a convenient way to specify the language of a page. It works like this:

<html lang="en">

However, Google has said that they ignore the language attribute. It was often set incorrectly that it didn't really help, and Google's algorithm is smart enough to determine the real language of the page.

Google's international crawling

No matter what language or region your site targets, Google's crawler will crawl it from the United States, and it won't specify a preferred language. That's why Google recommends that you don't serve different content based on the location of the user or the language settings. Instead, use the URL to decide what content to show to the user.

Conclusion

A search engine aims to provide search results in a language the user understands. Google uses various ways to figure out which language and location is for, and will include or exclude a site in the search results based on that.