Machine Learning Engineer – AI Model Post-Training and Inference
Location: Remote / Hybrid / Onsite
Team: Engineering & Research | Type: Full-time
About Eigen AI
Founded by world-class researchers and engineers from MIT, Stanford, Berkeley, Oxford and beyond, Eigen AI is building the foundation for enterprise-ready AI. We specialize in post-training, including fine-tuning, compression, and deployment of large language models and AI systems — making AI adoption efficient, cost-effective, and achievable for every enterprise, across cloud, on-prem, or edge environments.
We are looking for 2–3 Machine Learning Engineers with strong expertise in AI model post-training of large language models (LLMs), visual language models (VLMs), and multimodality models (e.g., image/video/audio generation) to join our team.
The Role
As a Machine Learning Engineer on our post‑training and inference team, you will bridge state‑of‑the‑art AI research with production systems. You will develop pipelines that synthesize data, fine‑tune and align models, compress large models, and deploy them at scale across our infrastructure. This hands‑on engineering role demands both deep technical expertise in generative AI and strong software‑engineering skills to deliver performant, reliable and secure services.
Key Responsibilities
- Develop post‑training pipelines. Build and maintain systems that generate synthetic data, perform supervised fine‑tuning (SFT) and reinforcement learning. Experience with advanced alignment methods such as Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO) is a plus.
- Model compression and optimization. Apply quantization (e.g. GPTQ, SmoothQuant and AWQ), pruning, and distillation techniques to reduce model size and improve inference latency. Evaluate trade‑offs between accuracy, throughput and memory and collaborate with researchers on architecture choices.
- Scalable deployment. Design and deploy inference services using modern serving stacks such as vLLM, SGLang and TensorRT‑LLM. Leverage techniques like speculative decoding and model parallelism to achieve high throughput and low latency. Integrate these services with container orchestration and cloud infrastructure.
- Cross‑functional collaboration. Partner with researchers, product managers and infrastructure engineers to bring new AI capabilities to market. Conduct design and code reviews, contribute to internal tooling and developer documentation, and participate in an on‑call rotation to ensure reliability.
- Continuous improvement. Monitor performance metrics, identify bottlenecks in data pipelines or inference systems, and proactively implement optimizations. Stay current with advances in generative AI (LLMs, VLMs, diffusion and other multimodal models) and propose enhancements to our platform.
Minimum Qualifications
- Bachelor's degree in Computer Science, Engineering or a related technical field.
- 2+ years of experience in software or machine‑learning engineering, including experience writing high‑performance, production‑quality code.
- Proficiency in Python and PyTorch; familiarity with other systems languages (e.g., Go, Rust, C++) is beneficial.
- Demonstrated experience building large‑scale, fault‑tolerant distributed systems and services.
- Solid understanding of transformer‑based models and generative AI; experience with fine‑tuning and RLHF workflows.
- Experience with model serving frameworks such as vLLM, SGLang, TensorRT‑LLM or similar inference engines, and knowledge of GPU/accelerator performance concepts.
- Excellent problem‑solving skills and the ability to communicate complex technical ideas to cross‑functional teams.
Preferred Qualifications
- Master's degree or PhD in Computer Science, Electrical Engineering or a related field.
- 5+ years of experience in software or machine‑learning engineering, including experience writing high‑performance, production‑quality code.
- Experience implementing advanced alignment techniques (RLHF, DPO, PPO) and evaluating their impact on model behavior.
- Hands‑on experience with model compression methods such as quantization, pruning/sparsity and distillation.
- Familiarity with CUDA/Triton programming and GPU performance profiling tools; experience with distributed inference or training across multi‑GPU environments.
- Familiarity with machine‑learning systems or compilers (e.g., torch.compile, Triton, XLA).
- Experience building or integrating multimodal applications (tool‑calling agents, coding assistants, image/video/audio generation) and understanding of enterprise AI infrastructure.
Why Join Eigen AI?
- 🚀 Solve hard problems – Work on challenges at the forefront of generative‑AI infrastructure, from low‑latency inference to scalable model serving.
- 🧠 Build what's next – Use cutting‑edge technologies to create AI systems that will shape how businesses and developers harness generative AI.
- 🌍 Ownership and impact – Join a fast‑growing, mission‑driven team where your contributions directly influence our products and roadmap.
- 📈 Work with the best – Collaborate with world‑class engineers and researchers who thrive on curiosity and innovation.