πŸ—ΊοΈCrawling & Sitemaps

Fetch and Parse robots.txt via API

robots.txt is the polite note websites leave for bots. denkbot.dog reads it. When you use the /crawl endpoint, robots.txt rules are respected by default. And if you just need to read another site's robots.txt, /scrape handles that too. The dog is polite.

What you'd use this for

Understanding crawling policies before scraping, checking which URLs are blocked, SEO analysis of robots.txt rules, and respectful automated crawling.

How it works

example
# Fetch and read a robots.txt
curl -X POST https://api.denkbot.dog/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com/robots.txt", "format": "json" }' \
  | jq '.text'

Questions & Answers

Does denkbot.dog respect robots.txt?+

Yes, by default in the /crawl endpoint. You can disable this with respectRobotsTxt: false.

Can I fetch and parse a robots.txt file?+

Fetch it with /scrape β€” the text field will contain the plain text of the file.

What about Disallow rules for specific user agents?+

denkbot.dog identifies itself with its own user agent when respecting robots.txt.

Ready to start fetching?

€19/year. Unlimited requests. API key ready in 30 seconds.