EvoCatalysis/BGC_annotation — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a usable research tool for biosynthetic gene cluster prediction based on the existing BGC MAC and BGC MAP code.

I want someone to be able to set up the environment, download or place the required model checkpoints and ESM2 weights, then run predictions on antiSMASH GenBank files without having to understand the whole repo. It should support giving either one GBK file or a folder of GBK files. It should output CSV files with class prediction scores for BGC classification, and also support matching BGCs to natural product SMILES strings for product ranking.

Please make the workflow reliable and beginner friendly. Add clear commands, helpful error messages when files are missing, and examples for single BGC classification, batch classification, single product matching, and batch product matching. Keep the training and evaluation scripts working too, but focus on making inference easy to run from the command line. Look up current docs online if you need to.

Want more depth? Deep Reverse