🗺️Crawling & Sitemaps

Fetch and Parse robots.txt via API

robots.txt is the polite note websites leave for bots. denkbot.dog reads it. When you use the /crawl endpoint, robots.txt rules are respected by default. And if you just need to read another site's robots.txt, /scrape handles that too. The dog is polite.

Get API Key →Read the Docs

What you'd use this for

Understanding crawling policies before scraping, checking which URLs are blocked, SEO analysis of robots.txt rules, and respectful automated crawling.

How it works

example

# Fetch and read a robots.txt
curl -X POST https://api.denkbot.dog/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com/robots.txt", "format": "json" }' \
  | jq '.text'

Questions & Answers

Does denkbot.dog respect robots.txt?+

Yes, by default in the /crawl endpoint. You can disable this with respectRobotsTxt: false.

Can I fetch and parse a robots.txt file?+

Fetch it with /scrape — the text field will contain the plain text of the file.

What about Disallow rules for specific user agents?+

denkbot.dog identifies itself with its own user agent when respecting robots.txt.

Related topics

Web Scraping API for Developers

A simple, fast web scraping API. Send a URL, get back HTML, text, metadata, and links. No proxies, no headless browser setup, no headaches.

Crawl Entire Websites via API

Crawl any website and get a full nested URL tree. Up to 500 pages, respects robots.txt, stays on-domain by default.

Extract Sitemaps from Any Website

Automatically find and parse XML sitemaps. Returns all URLs with lastmod, changefreq, and priority. No XML parsing required.

Web Scraping as a Service

Managed web scraping infrastructure. No proxies, no CAPTCHAs, no browser management. Just an API that works.

Browse more:🗺️ All Crawling & Sitemaps 🗂️ All integrations

Ready to start fetching?

€19/year. Unlimited requests. API key ready in 30 seconds.

Get API Key →API Reference