txtbot is a specialized web crawler designed to access and analyze robots.txt files across the internet.
You can identify txtbot by any of the following user agent strings:
txtbot (+https://txtbot.net)
or
Mozilla/5.0 (compatible; txtbot; +https://txtbot.net)
The primary purpose of txtbot is to crawl and analyze robots.txt files. This helps in understanding how websites manage their crawling and indexing preferences.
txtbot is designed to access individual robots.txt files once per day.
Note: It's possible that this timeframe may be more or less frequent in some edge cases. txtbot is engineered to minimize its impact and should not pose a burden on most sites.
As txtbot respects robots.txt directives, you can block it using the following methods:
Add the following lines to your robots.txt file:
User-agent: txtbot Disallow: /
Important: This method will not prevent txtbot from accessing the robots.txt file itself, as defined in section 2.2.2 of RFC 9309.
Alternatively, block the following IP addresses at the server/firewall level:
Important: Blocking using robots.txt is the recommeded and most reliable method, because these IP addresses may change over time. New addresses may be added, old ones may be removed and temporary IP addresses may be used.
If you have any questions or concerns regarding txtbot, please reach out at txtbot@txtbot.net.