RAG pipelines need content. Instead of building a scraper that fights with JavaScript rendering, anti-bot measures, and redirect chains, send URLs to denkbot.dog and get back clean text ready to chunk and embed. The dog fetches. Your vector store remembers.
Building knowledge bases from websites, ingesting documentation into LlamaIndex or LangChain vector stores, keeping RAG indexes fresh with live web content, extracting text from dynamic JS-rendered pages.
from llama_index.core import Document, VectorStoreIndex
import httpx
def url_to_document(url: str) -> Document:
r = httpx.post("https://api.denkbot.dog/scrape",
headers={"Authorization": f"Bearer {DENKBOT_API_KEY}"},
json={"url": url, "renderJs": True, "format": "json"}, timeout=30)
data = r.json()
return Document(
text=data["text"],
metadata={
"url": data["url"],
"title": data["title"],
"description": data["metadata"].get("description", ""),
}
)
urls = ["https://docs.example.com/intro", "https://docs.example.com/api"]
documents = [url_to_document(url) for url in urls]
index = VectorStoreIndex.from_documents(documents)Set renderJs: true. Playwright renders the page before extraction, so SPA doc sites work correctly.
Use POST /crawl to get all URLs, then batch-scrape them. The crawler returns a tree of all internal links up to 500 pages.
It strips HTML tags and returns readable text content. Some navigation text may remain — chunk by paragraph for best results.

€19/year. Unlimited requests. API key ready in 30 seconds.