waybackrevive/wayback-url-extractor — reverse-engineered prompt
Reverse engineered prompt
Build me a simple Python tool that can pull all archived URLs for any domain from the Wayback Machine. I want it to be easy for a normal person to use, so it should work if they run the script and type in a domain, but also support command line options for more advanced use.
It should save results as CSV by default, with options for JSON and plain text. Include filters for file types like HTML, images, or PDFs, date ranges by year, status codes like 200 or 301, a max URL limit, and an option to remove duplicates. Show clear progress while it runs and print a friendly summary at the end with total URLs, unique URLs, date range, file type counts, and where the file was saved.
Use the Wayback CDX API, handle large results without wasting memory, add retries and polite rate limiting, and keep everything local with no tracking. Look up current Wayback docs online if needed.
Want more depth? Deep Reverse