Python Browser Automation Agent with LLM Integration

Reverse engineered prompt

Build me a Python browser automation agent that can take a plain English task from the user and control Chrome to get it done. I want it to open real websites, search, click buttons, fill forms, handle dynamic pages, upload and download files, and explain what it did.

It should work with common LLM providers, including local models if possible, and have an option to use screenshots so it can understand the page visually. Please make it read the page structure in a useful way, showing buttons, links, inputs, headings, forms, and nearby text so the agent can choose the right thing to click or type into.

Add support for staying aware of page state, waiting for pages to load, recovering from failed actions, and pausing when a human needs to solve something like a CAPTCHA or OTP. Also include OAuth 2.0 with PKCE so users can log in once and reuse saved tokens later. Look up current docs online if you need to.

Want more depth? Deep Reverse

CursorTouch/Web-Use — reverse-engineered prompt

Reverse engineered prompt