🤖AI & LLM Integrations

Collect Web Data for AI Training

Your model needs data. The web has data. Getting from A to B used to require serious infrastructure. denkbot.dog makes the bridge. Scrape at scale, collect clean text, build your dataset. The dog fetches training data. You train the model.

What you'd use this for

Collecting fine-tuning datasets, building domain-specific corpora, creating benchmark datasets from live web content, and automated knowledge base construction.

How it works

example
import asyncio
import aiohttp

async def scrape_batch(urls, api_key):
    headers = {
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json',
    }
    async with aiohttp.ClientSession(headers=headers) as session:
        tasks = [
            session.post(
                'https://api.denkbot.dog/scrape',
                json={'url': url, 'format': 'json'},
            )
            for url in urls
        ]
        responses = await asyncio.gather(*tasks, return_exceptions=True)
    return [await r.json() for r in responses if not isinstance(r, Exception)]

Questions & Answers

Are there any content restrictions?+

Don't scrape anything you wouldn't be legally allowed to scrape. We don't verify use cases, but we do block SSRF and rate-limit abuse.

What's the rate limit for bulk collection?+

Free tier: 100 req/day. Pro tier with higher limits coming soon.

Can I use this for commercial model training?+

Check the ToS of the sites you're scraping. That's between you and them.

Ready to start fetching?

€19/year. Unlimited requests. API key ready in 30 seconds.