hmshb/scraping-agent-ai — reverse-engineered prompt
Reverse engineered prompt
Build me a Python AI web scraping agent that can take one or more website URLs, crawl them intelligently, and return clean structured results instead of messy page text.
I want it to use an agent style workflow so it can manage the scraping steps, handle failed pages with retries, and process URLs in batches. It should use Firecrawl for crawling, Claude for summarizing or formatting the extracted content, and LangSmith so I can see logs and debug runs. The output should be saved in useful formats like JSON and Markdown, with a final scraped data file I can open afterward.
Please include a simple env setup with keys for LangSmith, Anthropic, Firecrawl, plus settings like URL limit and batch limit. Make it easy to run locally with a Python virtual environment and a single dev command. Add a short README with setup steps and explain where the scraped results go. Look up current docs online if you need to.
Want more depth? Deep Reverse