awslabs/deequ — reverse-engineered prompt

Reverse engineered prompt

GitHub

Build me a Scala library for Apache Spark that lets data teams write simple unit tests for big tabular data before it goes into reports, apps, or machine learning.

I want users to be able to load a Spark dataframe, define checks like row count, required columns, unique ids, allowed values, non negative numbers, URL patterns, and approximate quantiles, then run everything and get a clear pass or fail result with readable error messages for each failed rule.

Please include a small runnable example with item data that shows bad records being caught, plus tests for the main behavior. It should feel easy to use, with a fluent style where someone can chain checks together and then run them. Also include basic support for computing data quality metrics, profiling columns, saving metrics for later querying, and detecting unusual changes over time if that fits cleanly.

Look up current Spark and Scala docs online if you need to.

Want more depth? Deep Reverse