Fetch Web Content for RAG Pipelines

RAG pipelines need content. Instead of building a scraper that fights with JavaScript rendering, anti-bot measures, and redirect chains, send URLs to denkbot.dog and get back clean text ready to chunk and embed. The dog fetches. Your vector store remembers.

Get API Key →Read the Docs

What you'd use this for

Building knowledge bases from websites, ingesting documentation into LlamaIndex or LangChain vector stores, keeping RAG indexes fresh with live web content, extracting text from dynamic JS-rendered pages.

How it works

example

from llama_index.core import Document, VectorStoreIndex
import httpx

def url_to_document(url: str) -> Document:
    r = httpx.post("https://api.denkbot.dog/scrape",
        headers={"Authorization": f"Bearer {DENKBOT_API_KEY}"},
        json={"url": url, "renderJs": True, "format": "json"}, timeout=30)
    data = r.json()
    return Document(
        text=data["text"],
        metadata={
            "url": data["url"],
            "title": data["title"],
            "description": data["metadata"].get("description", ""),
        }
    )

urls = ["https://docs.example.com/intro", "https://docs.example.com/api"]
documents = [url_to_document(url) for url in urls]
index = VectorStoreIndex.from_documents(documents)