txtbot

txtbot is a specialized web crawler designed to access and analyze robots.txt files across the internet.

User Agent

You can identify txtbot by any of the following user agent strings:

txtbot (+https://txtbot.net)

or

Mozilla/5.0 (compatible; txtbot; +https://txtbot.net)

Purpose

The primary purpose of txtbot is to crawl and analyze robots.txt files. This helps in understanding how websites manage their crawling and indexing preferences.

Crawling Frequency

txtbot is designed to access individual robots.txt files once per day.

Note: It's possible that this timeframe may be more or less frequent in some edge cases. txtbot is engineered to minimize its impact and should not pose a burden on most sites.

Blocking txtbot (recommended)

As txtbot respects robots.txt directives, you can block it using the following methods:

Method 1: Using robots.txt

Add the following lines to your robots.txt file:

User-agent: txtbot
Disallow: /

Important: This method will not prevent txtbot from accessing the robots.txt file itself, as defined in section 2.2.2 of RFC 9309.

Method 2: IP Blocking

Alternatively, block the following IP addresses at the server/firewall level:

Important: Blocking using robots.txt is the recommeded and most reliable method, because these IP addresses may change over time. New addresses may be added, old ones may be removed and temporary IP addresses may be used.

Contact

If you have any questions or concerns regarding txtbot, please reach out at txtbot@txtbot.net.