Nvidia's Nemotron-Cascade 2 Achieves Gold-Medal AI Reasoning with 20x Fewer Parameters Than Its Nearest Rival

Overview

Nvidia has released Nemotron-Cascade 2, an open-weight mixture-of-experts (MoE) language model that achieves gold-medal-level performance on three major international competitions while activating only 3 billion of its 30 billion total parameters. The model, its training data, and the full post-training pipeline have all been published under the Nvidia Open Model License, according to the model’s Hugging Face page.

The release makes Nemotron-Cascade 2 only the second open-weight model to reach gold-medal status on the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, after DeepSeek-V3.2-Speciale, a model with 671 billion parameters and 37 billion active parameters, as noted by Nvidia’s research page. That represents roughly 20 times fewer active parameters for comparable reasoning performance.

What We Know

On the 2025 IMO, the model scored 35 points, enough for a gold medal. It earned 439.3 points on the 2025 IOI and solved 10 of 12 problems in the ICPC World Finals, according to the Hugging Face model card. On AIME 2025, a widely used math benchmark, the model scored 92.4 without tools and 98.6 with tool-integrated reasoning. On LiveCodeBench v6, a coding benchmark drawn from competitive programming platforms, it reached 87.2, surpassing Qwen3.5-35B-A3B at 74.6 and Kimi-K2.5-1T at 85.0, as reported by VentureBeat.

The model supports a 262,144-token context window and operates in both a “thinking” mode with chain-of-thought reasoning and a standard instruct mode, according to the Hugging Face model card. It runs on a single GPU with tensor parallelism set to one, a significant practical advantage over frontier models that require multi-node clusters.

The Cascade RL Pipeline

The central technical contribution is the Cascade RL post-training framework, which Nvidia has open-sourced through the NeMo-RL repository. Rather than training a model on all domains simultaneously, Cascade RL applies reinforcement learning sequentially across instruction-following, multi-domain reasoning, code generation, and software engineering tasks, as described by Nvidia’s research page. This staged approach prevents catastrophic forgetting, a common failure mode in which optimizing for one domain degrades performance on another.

A second innovation, Multi-Domain On-Policy Distillation (MOPD), addresses the problem of benchmark regressions that can occur between Cascade RL stages. MOPD reuses intermediate checkpoints as internal teacher models, distilling the best performance from each stage into the final model without relying on external teacher models, according to VentureBeat.

Nvidia released the supervised fine-tuning dataset and the reinforcement learning training data alongside the model weights, all available through Hugging Face, as confirmed by the model card.

What We Don’t Know

Several questions remain open. The model’s agentic performance, while promising on internal benchmarks such as SWE-bench Verified (50.2) and tau-squared-Bench (58.9), has not yet been widely validated by independent researchers. It is unclear how Nemotron-Cascade 2 performs on real-world enterprise tasks that differ from competitive programming and olympiad-style problems.

The Nvidia Open Model License, while permitting broad use, imposes certain restrictions that have not yet been fully examined by the open-source community. Whether it meets the Open Source Initiative’s definition of open source remains to be determined.

It is also unclear how much the model’s performance depends on Nvidia-specific hardware optimizations. The predecessor model, Nemotron 3 Super, was explicitly optimized for Blackwell GPUs with native NVFP4 precision, and whether Cascade 2 carries similar hardware dependencies has not been detailed.

Analysis

Nemotron-Cascade 2 reflects a broader shift in the AI industry away from scaling model size toward what Nvidia’s research describes as “intelligence density” — achieving maximum capability per active parameter. The 20-to-1 parameter efficiency gap between Cascade 2 and DeepSeek-V3.2-Speciale suggests that post-training methodology, rather than raw model scale, is becoming the primary differentiator in frontier reasoning performance.

For enterprises, the practical implications are significant. A model that runs on a single GPU while matching the reasoning capabilities of cluster-scale competitors could lower the barrier to deploying advanced AI reasoning in production. The open release of the training pipeline, not just the model weights, means organizations can apply the Cascade RL methodology to their own models and domains.

The release also positions Nvidia as an increasingly active participant in the open-weight AI model ecosystem, complementing its dominant hardware business with software and model contributions that could help drive adoption of its GPU platforms.