
Palo Alto, California, June 4, 2026 — Eigen AI today announced available-at-launch inference support for three new open models in the NVIDIA Nemotron™ 3.x family: Nemotron 3 Ultra, Nemotron 3.5 ASR, and Nemotron 3.5 Content Safety. Working in close collaboration with NVIDIA, Eigen AI is serving all three models through EigenInference from day zero — giving developers a production-ready path to frontier reasoning, real-time multilingual speech, and enterprise-grade safety guardrails the moment the models become available.
All three models are accessible today through the Eigen AI Model Studio for enterprise customers and developers building the next generation of agentic systems. At the center of the family is NVIDIA Nemotron 3 Ultra, an open frontier-reasoning model built for long-running autonomous agents, delivering up to 5x faster inference and up to 30% lower cost while maintaining frontier-level reasoning.
Agents plan, call tools, delegate work, check results, and complete tasks. As workflows grow longer and more autonomous, the measure that matters is no longer raw model quality alone — it is the speed of task completion at a given accuracy. As agents operate across hundreds of turns, faster inference and lower cost directly translate into more completed tasks, and better economics at scale.
That is the principle behind NVIDIA Nemotron: a family of open models built for long-running agentic AI, designed so developers can use the right model for the right job. Reasoning models orchestrate and plan. Efficient models handle high-volume tool calling and validation. Speech models power real-time voice agents. Safety models enforce enterprise guardrails. Together, they work alongside proprietary frontier models to deliver higher accuracy and efficiency across the agent workflow.
The three models Eigen AI is bringing online at launch cover three distinct layers of that stack — and each is now optimized to run on EigenInference.
NVIDIA Nemotron 3 Ultra is a frontier-reasoning open model built for long-running, autonomous agents, across coding, deep research, and enterprise automation. Optimized for high-throughput agent workflows, it delivers up to 5x faster inference and up to 30% lower cost while maintaining frontier-level reasoning performance.
Ultra is built for the hardest calls in an agent workflow: architectural planning and multi-file refactors in week-long autonomous coding sessions, final synthesis across hundreds of contradictory research sources, persistent tool-using enterprise workflows, and verification across thousands of interdependent constraints in EDA and chip design.
NVIDIA Nemotron 3.5 ASR is an open, streaming speech-recognition model built for real-time, multilingual voice agents.
Its cache-aware streaming design processes each new audio chunk while reusing prior context, avoiding the redundant overlapping computation of traditional buffered streaming — which keeps end-to-end delay low without sacrificing transcription quality. That makes it a strong fit for voice agents, call centers, meeting transcription, in-car assistants, and live captioning.
NVIDIA Nemotron 3.5 Content Safety is an open, efficient multimodal, multilingual safety model for enterprise AI guardrails — across text, images, and operator-defined policies.
Because it is compact and fully self-hostable, Content Safety fits cleanly into prompt/response moderation, content-classification pipelines, policy enforcement, and sovereign or air-gapped deployments where no data can leave the customer boundary.
Bringing frontier open models into production is rarely as simple as loading the weights. Large hybrid-MoE reasoning models, low-latency streaming speech, and always-on safety moderation each place very different demands on compute, memory, and scheduling — and meeting all of them at production scale requires system-level optimization across the full stack.
That is what EigenInference is built for. Through a close collaboration with NVIDIA, the NVFP4 build of Nemotron 3 Ultra is optimized for production and deployed on NVIDIA Blackwell GPUs via EigenInference, applying the same full-stack optimization pipeline that has made EigenInference the #1 GPU-based provider across 25 leading open models on Artificial Analysis. Nemotron 3.5 ASR and Nemotron 3.5 Content Safety are likewise served on EigenInference with hardware-efficient optimization tuned to their streaming and moderation workloads.
Across all three models, EigenInference brings:
The result is the throughput and stability enterprises need to run these models continuously, without building or maintaining a custom serving stack in-house.
For teams building agentic systems, the path from model release to production is usually slow and complex. Day-0 support on EigenInference collapses that gap:
Developers get NVIDIA's newest open models and the throughput of Eigen AI's optimized inference stack — production-ready from launch day.
All three Nemotron 3.x models are available through the Eigen AI Model Studio:
To talk with our team about deploying Nemotron 3 Ultra, Nemotron 3.5 ASR, or Nemotron 3.5 Content Safety in your agent system, get in touch with an AEI expert.
The NVIDIA Nemotron family is a collection of open models, datasets, and tools built for long-running agentic AI. Designed to help developers use the right model for the right job, the family spans frontier reasoning, specialized agents, real-time speech, enterprise safety, and information retrieval. Together, Nemotron models power a growing ecosystem of open, efficient, and production-ready AI systems.
Eigen AI is a leading pioneer in Artificial Efficient Intelligence (AEI), delivering high-performance solutions for enterprises demanding elite speed and accuracy. Founded by a world-class team, the company transforms raw open models into hyper-optimized, agentic intelligence. Through its EigenLoop platform — EigenData, EigenTrain, and EigenInference — Eigen AI delivers remarkably precise, hardware-efficient reliability across cloud, private cloud, on-prem, and edge deployments. The company is headquartered in Palo Alto, California.
Artificial Efficient Intelligence — AGI Tomorrow, AEI Today.