← All posts

Published 2013-08-13 ·4 min read

The ROBOTS.TXT File

#Basic SEO ·#ROBOTS.TXT ·#SEO course

The robots.txt file is a plain-text document placed in the root folder of your website that tells search engine crawlers (Googlebot, Bingbot, etc.) which paths to skip and where to find your sitemap. It’s used to avoid duplicate content, reduce server load, and keep private pages out of the index. It is not a security tool: well-behaved bots respect it, but malicious bots ignore it entirely.

The ROBOTS.TXT file is a text document that serves to establish crawling guidelines for Bots to follow when exploring your Website. Bots (also called Spiders or Crawlers) are used by search engines to access your Website and index the content (text, images, files…) of its pages.

What is robots.txt for?

With the ROBOTS.TXT file we can discourage Bots to access a certain folder of our Website. We can also avoid a certain Bot from crawling our Website or limit its crawling frequency.

Some of the reasons we might want to do this are:

  • Avoid duplicate content. This is the most important reason because, if we do this, we will rank higher in search engines, thus increasing our traffic.
  • Reduce server overload due to excess of search engine petitions that could saturate it.
  • Avoid indexing of certain pages that you want to be accessible to users but not indexed in Google due to privacy reasons.

We can also add a sitemap of our Website or SITEMAP.XML file to indicate Bots the URLs of all the pages of our site.

What robots.txt is NOT for

As we have said before, the ROBOTS.TXT file establishes crawling guidelines and bots may not honor your rules, especially the so-called “bad bots”, whose only purpose is to crawl your Website searching for e-mails, private data or vulnerabilities.

If you have sensitive information on your Website and you don’t want bots to crawl it, you should use other security means to protect it. Also, with the ROBOTS.TXT file you can’t protect your Website from Hackers who are using “brute force” attacks.

How to create a robots.txt file

You can use one of the following Online tools to create the ROBOTS.TXT file, although we highly recommend you follow Google’s instructions in order to create it manually. You can also read this Wikipedia article about Robots exclusion standards.

It has to be located in the root folder of your Website, the same as the FAVICON and the SITEMAP.

robots.txt file example

This is MetricSpot’s robots.txt file:

User-agent: *
Disallow: /new/
Disallow: /tos/
Disallow: /items/
Disallow: /no/
Disallow: /condiciones-de-uso/
Disallow: /blog/cat/
Disallow: /blog/tag/
Disallow: /blog/wp-admin/
Disallow: /blog/wp-includes/
Disallow: /blog/wp-content/plugins/
Disallow: /blog/wp-content/themes/
Disallow: /blog/feed/
Disallow: /api/www.metricspot.com
Disallow: /*.js$
Disallow: /*.css$
Sitemap: https://metricspot.com/sitemap.xml

What each line does

DirectivePurpose
User-agent: *Rules apply to ALL bots
Disallow: /tos/Block “Terms of Service” page (security/privacy)
Disallow: /new/, Disallow: /items/Block temporary content folders (avoid duplicate content)
Disallow: /blog/cat/, /blog/tag/Block WordPress category/tag archives (duplicate content)
Disallow: /*.js$, /*.css$Block JavaScript and CSS crawling (server load)
Sitemap: https://metricspot.com/sitemap.xmlTell bots where to find the sitemap

The User-agent: * line indicates the following rules apply to ALL Bots. The Disallow: directive blocks specific pages or folders, followed by the URI to be blocked. For security reasons, we have blocked the “Terms of Service” page because it includes information we don’t want to be indexed.

In order to avoid duplicate content issues we have blocked the /new/ and /items/ folders, which are used by our App to create temporary content. We have also blocked the /blog/cat/ and /blog/tag/ folders used by our Blog to host categories and tags. The Disallow: /*.js$ and Disallow: /*.css$ rules block crawling of all JavaScript and CSS files to avoid server overload.

Note (2024+): blocking JS and CSS is no longer recommended because Google needs them to render and judge mobile usability. Modern best practice is to allow JS/CSS.

Last, with the Sitemap: https://metricspot.com/sitemap.xml line we show Bots where to find our Sitemap.

Key takeaways

  • robots.txt lives in the root folder of your domain (/robots.txt), not in subfolders.
  • It’s a guideline, not a lock: well-behaved bots respect it; bad bots ignore it.
  • Don’t use it for security. Use authentication or noindex headers for private content.
  • Always include your sitemap URL so crawlers can find it without guessing.
  • Don’t block JS/CSS in modern setups. Google needs them to render mobile pages.

FAQ

Where does the robots.txt file go?

In the root folder of the domain, accessible at https://yourdomain.com/robots.txt. It must be lowercase.

Can robots.txt hide pages from Google?

Disallow tells bots not to crawl a page, but the URL can still be indexed if linked from elsewhere. To remove a page from the index, use a noindex meta tag or HTTP header.

Should I block AI crawlers like GPTBot or ClaudeBot?

That depends on your goals. Blocking them keeps your content out of LLM training data but also out of AI answer engines that drive citation traffic. Most SEO sites now allow them.