Raw HTML is a crime scene. Tags everywhere, inline styles, ` ` landmines. denkbot.dog parses the mess and serves you clean structured JSON. The dog ate the HTML and synthesized the information into something a normal developer can use.
Data extraction pipelines, content aggregation, training ML models on web content, building search indexes, and any pipeline that needs structured data from unstructured web pages.
const res = await fetch('https://api.denkbot.dog/scrape', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({ url: 'https://blog.example.com/post', format: 'json' }),
})
const { title, text, metadata, links } = await res.json()url, finalUrl, statusCode, title, html, text, metadata (og tags, description, canonical), links, cached, durationMs.
Yes. All anchor hrefs are extracted and returned as an array of { href, text } objects.
Image URLs aren't extracted separately yet. They are present in the raw HTML.

€19/year. Unlimited requests. API key ready in 30 seconds.