daominhwysi/manga-translator — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a Python tool that can translate manga or comic pages from an image file or a folder of images.

I want to run it from the command line, give it an input image, choose the source language or leave it on auto, choose the target language, and get back a final image where the original text is removed and the translated text is placed back inside the speech bubbles. It should also save useful intermediate files like detected text boxes, masks, cleaned backgrounds, OCR text, translated text, and the final rendered page.

Use the existing deep learning style pipeline described here, text detection, speech bubble detection and segmentation, inpainting, OCR, translation with the Google Gemini API, then text rendering. It should use Pixi for setup, read GEMINI_API_KEY from a .env file, and automatically download model checkpoints the first time it runs.

Please make it usable on cuda or cpu, include a sample command, and keep the code modular enough that I can import the pipeline from another Python script too.

Want more depth? Deep Reverse