Python Project for Video Generation from Text Prompts

Reverse engineered prompt

I want a Python project that can generate long videos from text prompts using LongLive 2.0. Please set it up so I can run inference locally on an NVIDIA GPU, choose BF16 or NVFP4 models, paste a prompt like a robot walking through a lab, and get an mp4 saved with a sensible name.

Include simple config files and clear commands for downloading the Hugging Face weights, choosing the checkpoint, setting output length and fps, and running a quick test. If the machine has multiple GPUs, support the faster parallel path, but keep a single GPU path working too.

I also want the code to handle streaming decode and async decoding options without me editing Python every time. Add a short README with install steps, example commands, expected folders for model files, and troubleshooting for CUDA or missing checkpoint issues. Look up the current LongLive docs online if you need to.

Want more depth? Deep Reverse

NVlabs/LongLive — reverse-engineered prompt

Reverse engineered prompt