Python Tool for Detecting Unicode Gambling SEO Spam

Reverse engineered prompt

Build me a Python command line tool and library that detects online gambling SEO spam hidden with Unicode lookalike characters. It should catch words like slot, judi, gacor, deposit, and maxwin when attackers disguise them with Greek, Cyrillic, fullwidth, zero width, or bidi characters, but it should not flag normal plain ASCII words or legitimate foreign language text.

I want it to scan HTML files or stdin, print a clear human report, optionally output JSON, and return a failing exit code when something suspicious is found. The report should show the matched keyword, the disguised text, confidence level, and useful character evidence like codepoint, Unicode name, script, UTF 8 bytes, and skeleton form.

Also add generators for YARA rules for stored web content and Sigma rules for proxy, DNS, and webserver logs, using the same keyword and confusable mapping logic. Include a demo spam page, tests, and simple usage examples so I can run it locally right away.

Want more depth? Deep Reverse

INFOKOM-KI/yara_check_homoglyph — reverse-engineered prompt

Reverse engineered prompt