hsci-r/finnish-media-scrapers — reverse-engineered prompt

Reverse engineered prompt

Build me a Python package and command line tool for collecting article text from major Finnish news sites, mainly Yle, Helsingin Sanomat, Iltalehti, and Iltasanomat.

I want to be able to search by keyword and date range, save the list of matching articles to a CSV, download the article pages, turn the saved HTML into clean plain text, and then optionally run a stricter local filter on the extracted text because the news site search can return fuzzy matches. Please keep each step separate and save the outputs of each stage so the process is easy to audit and reproduce later.

Make it useful for research, so it should fail loudly if a source returns too many results or if an article layout does not parse reliably. Add a polite delay between requests by default. It would also be great if the same functionality can be used from Python code, not just the terminal.

For Helsingin Sanomat, account for the fact that login based fetching may currently be blocked by captcha, so handle that honestly in the implementation or docs. Look up current docs online if you need to.

Want more depth? Deep Reverse