Paparusi/crawlkit — reverse-engineered prompt
Reverse engineered prompt
Build me an open source Python project called CrawlKit. I want one simple way to give it a URL and get useful text back for AI apps, whether the URL is a normal web page, a news article, a PDF, or a video from YouTube, TikTok, or Facebook. The result should include clean markdown, plain text, structured JSON, metadata, transcripts when available, and optional chunks with token estimates so I can feed it into a chatbot or search system.
Please include a local API with endpoints to scrape one URL, scrape a batch, discover URLs from supported sites, take screenshots, monitor a page for changes, and check health. It should handle JavaScript heavy pages, use stealth settings when needed, do OCR for scanned PDFs, extract entities and keywords, and use special parsers for Vietnamese news, legal, real estate, finance, GitHub, and video sites.
Make it easy to install as a Python library, run with Docker, and test locally. Look up current docs online if you need to.
Want more depth? Deep Reverse