Lightweight Rust Command Line AI Inference Server

Reverse engineered prompt

Build me a lightweight Rust command line app that runs a local AI inference server on my computer, so I can point apps that already use the OpenAI API at localhost and have them work without sending data to the cloud.

It should be a single easy to download binary, start with a simple serve command, choose an open port if the default is busy, and have a list command that shows the models it found. It should automatically look for local models in common places like Hugging Face cache, Ollama, and normal folders, then serve GGUF and SafeTensors models through OpenAI style chat completions endpoints.

Please include basic GPU detection when available, a CPU fallback, support for larger models with CPU and GPU sharing if possible, and clear smoke test examples using curl plus simple Node and Python clients. Keep the setup simple, private, free to run locally, and compatible with tools like Cursor or VS Code extensions.

Want more depth? Deep Reverse

Michael-A-Kuykendall/shimmy — reverse-engineered prompt

Reverse engineered prompt