Moonshot AI Open-Sources Kimi K2.6, a Trillion-Parameter Model That Runs 300-Agent Swarms for Hours
Moonshot released Kimi K2.6 under a modified MIT license, claiming parity with GPT-5.4 and Claude Opus 4.6 on coding benchmarks while orchestrating agent swarms that run for half a day unattended.
Overview
Moonshot AI, the Beijing startup behind the Kimi assistant, has released Kimi K2.6, a trillion-parameter open-weight model aimed squarely at the coding and agentic workloads that define the current frontier race. The release landed on April 20, according to SiliconANGLE, and was publicly announced the following day alongside the model weights, as reported by the South China Morning Post.
The launch continues a Chinese open-source push that has narrowed the gap with closed US labs, and it arrives on the same April 20 release day that Alibaba pivoted in the opposite direction with the closed-weights Qwen3.6-Max-Preview.
What We Know
Architecture
Kimi K2.6 is a Mixture-of-Experts model with one trillion total parameters, split across 384 experts with eight activated per inference, plus a 400-million-parameter vision encoder for multimodal input, SiliconANGLE reports. The same source notes the model uses multi-head latent attention (MLA), a hardware-efficient variant that compresses attention state, and the SwiGLU activation function — both aimed at reducing inference cost relative to a dense trillion-parameter system.
Benchmarks
Moonshot positions K2.6 as a peer of closed Western frontier models. According to The Decoder, the model scores 54.0 on Humanity’s Last Exam with tools, 58.6 on SWE-Bench Pro, and 83.2 on BrowseComp. The Decoder adds that K2.6 “falls behind on pure reasoning and vision,” so the coding parity claim is narrower than a blanket frontier comparison.
The SCMP coverage notes that Moonshot said the model “performed on par with or better than leading closed-source systems such as GPT-5.4, Claude Opus 4.6 and Gemini 3.1 Pro across several benchmarks,” while cautioning that “independent verification of these claims remains limited, highlighting the lack of standardised evaluation across open and closed models,” per the South China Morning Post.
Long-Horizon Agents
The headline product feature is what Moonshot calls Agent Swarms. VentureBeat writes that the system can manage up to 300 sub-agents “executing across 4,000 coordinated steps simultaneously,” with internal use cases that ran for hours — and in one case, five straight days — handling monitoring and incident response without human intervention. The Decoder describes the same swarm topology of “up to 300 sub-agents at once, each taking 4,000 steps” and reports continuous execution of more than twelve hours across Rust, Go, and Python.
In a technical document shared with VentureBeat, Moonshot claimed K2.6 built a full SysY compiler from scratch in 10 hours — work the company characterized as equivalent to a team of four engineers over two months — and passed all 140 functional tests without human intervention, according to VentureBeat.
Licensing and Distribution
Kimi K2.6 ships under a modified MIT license. The Decoder reports the license permits broad commercial use, with the caveat that deployments crossing 100 million monthly active users or 20 million US dollars in monthly revenue must display visible “Kimi K2.6” attribution — a tiered approach similar to Meta’s Llama license that carves out a path for enterprise adoption while preserving branding at hyperscale.
The model is available on Hugging Face, through Moonshot’s API, and via the Kimi Code command-line tool and Kimi chat app, SiliconANGLE reports.
What We Don’t Know
Moonshot has not disclosed training compute or cost figures for K2.6, nor has it published a full technical report covering dataset composition. The South China Morning Post noted that “independent verification” of the benchmark claims is still limited, and the headline long-horizon runs — the five-day incident response session and the 10-hour compiler build — come from Moonshot’s own internal testing rather than third-party evaluations. The practical reliability of 300-agent orchestration outside the company’s own infrastructure remains an open question: VentureBeat frames the release as “exposing the limits of enterprise orchestration” rather than solving them, since most production agent frameworks were not designed for stateful multi-day execution.
It is also unclear how the modified MIT terms will interact with downstream fine-tunes and derivative models distributed by third parties, or whether the attribution threshold will be enforced in practice.