NVlabs/cuda-oxide

Reverse engineered prompt

Build me an experimental Rust toolchain project that lets people write NVIDIA GPU kernels directly in normal Rust instead of using CUDA C or a separate DSL.

I want it to feel like a Cargo workflow. A user should be able to write host code and device kernel code in one Rust file, mark GPU functions with simple attributes like kernel modules, compile those kernels to PTX, load them into the host program, manage GPU buffers, and launch kernels with typed Rust calls. Include a small working example like vector add or map over an array, where a Rust closure can be passed into the GPU kernel and the result is copied back and checked.

Please also include a command line helper like cargo oxide with commands to build, run examples, show the compiler pipeline, debug, and run a doctor check for CUDA, Rust nightly, LLVM, and clang setup. Keep it clear that this is alpha research software, but make the repo usable with docs, examples, and a dev container setup.

Want more depth? Deep Reverse

NVlabs/cuda-oxide — reverse-engineered prompt

Reverse engineered prompt