Question 1

What is robots.txt?

Accepted Answer

Robots.txt is a plain-text file placed at the root of a website that tells compliant web crawlers which paths they should or should not fetch. The format is defined by the Robots Exclusion Protocol, standardized as IETF RFC 9309 in 2022. Each group of rules starts with one or more User-agent lines followed by Allow and Disallow directives, and optional Sitemap lines can appear anywhere in the file.

Question 2

Where do I place robots.txt?

Accepted Answer

Place it in the document root of each host so it is reachable at https://yourdomain.com/robots.txt. A file on a subdirectory like /site/robots.txt is ignored. Each subdomain is treated as its own host, so blog.example.com needs its own robots.txt separate from www.example.com. Protocol and port also matter — HTTPS and HTTP are distinct origins for crawling purposes.

Question 3

Does robots.txt block pages from appearing in Google?

Accepted Answer

No. Robots.txt prevents crawling, not indexing. Google can still index a URL it is not allowed to crawl if it discovers the URL through external links, and will show a bare listing with no snippet. To keep a page out of search results entirely, allow crawling and add a tag or an X-Robots-Tag: noindex HTTP header so Google can read the directive.

Question 4

What is the difference between crawling and indexing?

Accepted Answer

Crawling is the act of fetching a URL and its resources. Indexing is the act of storing the crawled content in a searchable database so it can be ranked for queries. Robots.txt only controls crawling. A page can be indexed without being crawled (via external links) or crawled without being indexed (via noindex). Confusing the two is the most common robots.txt mistake.

Question 5

Should I block CSS and JavaScript in robots.txt?

Accepted Answer

No. Google has stated since 2015 that it renders pages like a modern browser and needs to fetch CSS, JS and image assets to understand layout, mobile-friendliness and Core Web Vitals. Blocking /wp-content/, /static/ or CDN paths can cause Google to see a broken page and demote it in rankings. Leave your asset directories crawlable and use noindex on the individual HTML pages you want to hide.

Question 6

Can I have multiple user-agent rules?

Accepted Answer

Yes. A robots.txt file can contain any number of user-agent groups. A crawler reads the whole file, picks the single group whose User-agent line most specifically matches its name, and ignores the others — including the * wildcard group if a more specific match exists. That means once you add User-agent: Googlebot, Google will only obey that block and will not inherit rules from the * block, so repeat any shared rules inside the specific group.

Question 7

What does the Allow directive do?

Accepted Answer

Allow is an exception to a broader Disallow. When the path in Allow is more specific than the path in a matching Disallow, the crawler may fetch that URL. This is the standard way to open a single file or subfolder inside an otherwise blocked area. Not all crawlers honor Allow (the original 1994 spec did not include it), but Googlebot, Bingbot and most modern bots do.

Question 8

Does Crawl-delay work for Googlebot?

Accepted Answer

No. Googlebot has never implemented the Crawl-delay directive. It is honored by Bing, Yahoo and Yandex as a minimum number of seconds between fetches. To slow Googlebot, use the crawl-rate setting inside Google Search Console on the affected property, or return HTTP 503 responses during temporary overload and Google will back off automatically.

Question 9

What happens if robots.txt returns an error or is missing?

Accepted Answer

A missing file (HTTP 404) is treated as "no restrictions" - crawlers assume the whole site is open. A persistent server error (HTTP 5xx) is treated as "fully disallowed" by Google for the duration of the failure, so a misconfigured server serving 503 on the robots.txt request can accidentally deindex a site. Make sure the file returns a clean 200.

Robots.txt Generator

How to Use the Robots.txt Generator

About the Robots Exclusion Protocol

Examples

How Path Matching Actually Works

A Worked Example and Common Mistakes

When to Use a Robots.txt File

Frequently Asked Questions

Related tools

More SEO & Web Tools

Google SERP Preview

Heading Structure Analyzer

Hreflang Tag Generator

Keyword Density Checker

Meta Tag Generator

Open Graph Preview