Local Text-to-Video Web App

Reverse engineered prompt

Build me a local Python app for Text2Video Zero that I can run on my GPU and use in a browser. I want it to turn a text prompt into a short video, and also support guided modes where I can upload a reference video for pose, edges, or depth so the result follows that motion. I also want a simple video editing mode where I can give an instruction like changing the style or look of an existing video.

Keep the UI straightforward. A prompt box, optional model picker, upload fields, and a few settings for video length, fps, motion strength, and edge thresholds with good defaults. Let me save results as mp4 or gif. If possible, also let me load a different base Stable Diffusion model or a DreamBooth style model for the edge guided workflow.

Please make it usable on lower VRAM setups when you can, but still work well on normal CUDA GPUs. Include a clean README with setup and how to run each mode. You can look up current docs online if needed.

Want more depth? Deep Reverse

Picsart-AI-Research/Text2Video-Zero — reverse-engineered prompt

Reverse engineered prompt