AI Answer Faithfulness Evaluation Package

Reverse engineered prompt

Build me a Python package called Groundlens that checks whether AI answers are faithful to the information they were given, without using another AI judge. I want to pass in a question, some source context, and the model answer, then get a clear score, a simple pass or review flag, and a plain English explanation that can be saved for audits.

It should also work when there is no source context, using a separate scoring method for normal chat answers. Please include batch scoring, a small command line tool to check setup and run evaluations, and examples showing how someone would use it with a RAG app. If possible, add wrappers for common LLM providers and callbacks for agent workflows like LangGraph so each model call can be scored automatically.

Keep the math deterministic and explainable with embeddings, distances, and cosine similarity. Add tests, basic docs, and a few realistic examples so the project feels ready to publish.

Want more depth? Deep Reverse

groundlens-dev/groundlens — reverse-engineered prompt

Reverse engineered prompt