The robots.txt file is the first thing a search engine crawler checks when visiting a website. This text file, located at the root of the domain, contains instructions for crawlers: which pages can be indexed and which cannot.
Robots.txt Syntax
The file consists of rule blocks. Each block starts with a User-agent directive, followed by Disallow and Allow rules:
- User-agent: * — rules for all crawlers
- User-agent: Googlebot — rules for Google only
- Disallow: /admin/ — block the /admin/ section from indexing
- Allow: /admin/public/ — allow a subdirectory (exception to Disallow)
- Sitemap: https://example.com/sitemap.xml — path to the sitemap
Common Mistakes
- Blocking CSS/JS files — Google won't be able to render the page
- Disallow: / — completely blocks the entire website from indexing
- Missing Sitemap — the crawler may not discover all pages
Conclusion
Create a proper robots.txt with our generator. Add Schema.org markup using the Schema.org Generator.