RAG needs documents. Documents are URLs. URLs are web pages. Web pages need scraping. denkbot.dog handles the last mile — convert any URL to clean text, ready to chunk and embed into your vector store. The dog fetches documents. Your RAG retrieves knowledge.
Building knowledge bases for RAG, populating vector databases with web content, creating searchable document repositories from web sources, and real-time context injection for LLMs.
// RAG pipeline: scrape → chunk → embed → store
const { text, title, url: finalUrl } = await fetch(
'https://api.denkbot.dog/scrape',
{
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({ url: 'https://docs.example.com/guide' }),
}
).then(r => r.json())
// Chunk and embed
const chunks = chunkText(text, 500)
const embeddings = await embedAll(chunks)
await vectorStore.upsert(chunks.map((chunk, i) => ({
id: `${finalUrl}:${i}`,
text: chunk,
embedding: embeddings[i],
metadata: { title, url: finalUrl },
})))Plain text with HTML stripped. Ready to chunk.
15-minute cache. For a RAG system, re-scrape periodically to keep your knowledge base current.
Yes. Use /crawl to get all URLs, then /scrape each one.

€19/year. Unlimited requests. API key ready in 30 seconds.