txtbot is a specialized web crawler designed to access and analyze plain text files (such as robots.txt, security.txt, humans.txt, llms.txt, etc.) across the internet.
You can identify txtbot by any of the following user agent strings:
txtbot (+https://txtbot.net)
or
Mozilla/5.0 (compatible; txtbot; +https://txtbot.net)
The primary purpose of txtbot is to crawl and analyze plain text files (such as robots.txt, security.txt, humans.txt, llms.txt, etc.). It is not designed to access HTML pages, however it may be redirected by websites to access them (e.g. by returning a 301 or 302 redirect or 404 Not Found when responding to a request for a txt file), in which case the body of the response will be discarded. However, if the website returns a 200 OK response with a body, txtbot will parse the body and extract the information it can.
txtbot is designed to access individual plain text files (such as robots.txt, security.txt, humans.txt, llms.txt, etc.) no more than a few times per day.
Note: It's possible that this timeframe may be more or less frequent in some edge cases. txtbot is engineered to minimize its impact and should not pose a burden on most sites.
As txtbot respects robots.txt directives, you can block it using the following methods:
Add the following lines to your robots.txt file:
User-agent: txtbot
Disallow: /
Important: This method will not prevent txtbot from accessing the robots.txt file itself, as defined in section 2.2.2 of RFC 9309.
Alternatively, block the following IP addresses at the server/firewall level:
Important: Blocking using robots.txt is the recommended and most reliable method, because these IP addresses may change over time. New addresses may be added, old ones may be removed and temporary IP addresses may be used.
If you have any questions or concerns regarding txtbot, please reach out at txtbot@txtbot.net.