🤖AI & LLM Integrations

Collect Web Data for RAG Systems

RAG needs documents. Documents are URLs. URLs are web pages. Web pages need scraping. denkbot.dog handles the last mile — convert any URL to clean text, ready to chunk and embed into your vector store. The dog fetches documents. Your RAG retrieves knowledge.

What you'd use this for

Building knowledge bases for RAG, populating vector databases with web content, creating searchable document repositories from web sources, and real-time context injection for LLMs.

How it works

example
// RAG pipeline: scrape → chunk → embed → store
const { text, title, url: finalUrl } = await fetch(
  'https://api.denkbot.dog/scrape',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ url: 'https://docs.example.com/guide' }),
  }
).then(r => r.json())

// Chunk and embed
const chunks = chunkText(text, 500)
const embeddings = await embedAll(chunks)
await vectorStore.upsert(chunks.map((chunk, i) => ({
  id: `${finalUrl}:${i}`,
  text: chunk,
  embedding: embeddings[i],
  metadata: { title, url: finalUrl },
})))

Questions & Answers

What text format does denkbot.dog return?+

Plain text with HTML stripped. Ready to chunk.

How fresh is the data?+

15-minute cache. For a RAG system, re-scrape periodically to keep your knowledge base current.

Can I crawl a docs site and index everything?+

Yes. Use /crawl to get all URLs, then /scrape each one.

Ready to start fetching?

€19/year. Unlimited requests. API key ready in 30 seconds.