Email us

Careers

CUDA Engineer - Kernel & Performance Specialist

We're hiring a CUDA Engineer to push GPU performance to the limits on NVIDIA Hopper and Blackwell.

Apply now View all roles

You'll design and optimize CUDA and C++ kernels powering LLMs, transformers, and generative AI with low-precision formats, operator fusion, and advanced memory optimization.

What you'll do

Build and optimize CUDA kernels (attention, MLP, layernorm, etc.).
Develop FP4 and FP8 kernels and support new microscaling formats such as MXFP4 and MXFP6.
Use CUTLASS for high-performance GEMMs and fused ops.
Profile and tune performance across the GPU memory hierarchy.
Integrate with PyTorch, Triton, and TensorRT.

What we're looking for

Strong CUDA and C++ expertise with deep knowledge of Hopper and Blackwell.
Experience with low-precision formats, CUTLASS, and Triton.
Skilled in operator fusion, tiling, and warp-level programming.
Proficient with profiling tools including Nsight and nvprof.

Preferred

Experience with Blackwell microscaling formats.
Open-source contributions or published work in low-precision kernels.