Collect Web Data for RAG Systems

RAG needs documents. Documents are URLs. URLs are web pages. Web pages need scraping. denkbot.dog handles the last mile — convert any URL to clean text, ready to chunk and embed into your vector store. The dog fetches documents. Your RAG retrieves knowledge.

Get API Key →Read the Docs

What you'd use this for

Building knowledge bases for RAG, populating vector databases with web content, creating searchable document repositories from web sources, and real-time context injection for LLMs.

How it works

example

// RAG pipeline: scrape → chunk → embed → store
const { text, title, url: finalUrl } = await fetch(
  'https://api.denkbot.dog/scrape',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ url: 'https://docs.example.com/guide' }),
  }
).then(r => r.json())

// Chunk and embed
const chunks = chunkText(text, 500)
const embeddings = await embedAll(chunks)
await vectorStore.upsert(chunks.map((chunk, i) => ({
  id: `${finalUrl}:${i}`,
  text: chunk,
  embedding: embeddings[i],
  metadata: { title, url: finalUrl },
})))