📊Data Extraction

Convert HTML to Structured JSON

Raw HTML is a crime scene. Tags everywhere, inline styles, ` ` landmines. denkbot.dog parses the mess and serves you clean structured JSON. The dog ate the HTML and synthesized the information into something a normal developer can use.

What you'd use this for

Data extraction pipelines, content aggregation, training ML models on web content, building search indexes, and any pipeline that needs structured data from unstructured web pages.

How it works

example
const res = await fetch('https://api.denkbot.dog/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ url: 'https://blog.example.com/post', format: 'json' }),
})
const { title, text, metadata, links } = await res.json()

Questions & Answers

What fields does the JSON response include?+

url, finalUrl, statusCode, title, html, text, metadata (og tags, description, canonical), links, cached, durationMs.

Does it extract all links?+

Yes. All anchor hrefs are extracted and returned as an array of { href, text } objects.

What about images?+

Image URLs aren't extracted separately yet. They are present in the raw HTML.

Ready to start fetching?

€19/year. Unlimited requests. API key ready in 30 seconds.