Mistral Medium 3.5 Folds Chat, Reasoning, and Coding Into a Single 128-Billion-Parameter Open-Weight Flagship
Mistral released Medium 3.5, a dense 128B open-weight model with a 256k context window that consolidates Medium 3.1, Magistral, and Devstral 2 under a Modified MIT license, with a per-query reasoning toggle.
Overview
Mistral released Mistral Medium 3.5, a 128-billion-parameter open-weight model that the company is positioning as its first flagship merged model — handling chat, reasoning, and coding from a single set of weights instead of three. According to The Decoder, the dense 128B model ships with a 256,000-token context window, can be self-hosted on four GPUs, and is distributed under a Modified MIT License that carves out exceptions for high-revenue companies. The release simultaneously retired three product lines: Medium 3.1, the Magistral reasoning model, and the Devstral 2 coding model.
What We Know
The model is dense rather than mixture-of-experts: as The Decoder explains, all 128 billion parameters get loaded and activated for every token generated, which sets it apart from the sparse MoE designs that dominate the current open-weight frontier. The official model card on Hugging Face confirms the dense architecture, the 256k context length, and that the weights are released under a Modified MIT License — described in the card as an “Open-source license for both commercial and non-commercial use with exceptions for companies with large revenue.”
The consolidation is explicit. Mistral’s Hugging Face model card states that “Mistral Medium 3.5 replaces its predecessor Mistral Medium 3.1 and Magistral in Le Chat. It also replaces Devstral 2 in our coding agent Vibe.” The mechanism that allows one model to do three jobs is a per-request reasoning toggle: the card describes “Instruct and Reasoning functionalities with function calls (reasoning effort configurable per request),” and on The Decoder’s reading the reasoning behavior operates as a parameter on each query rather than as a separate model.
On Mistral’s own benchmarks, the model scores 77.6 percent on SWE-Bench Verified and 91.4 percent on τ³-Telecom, numbers reported on the Hugging Face model card and echoed by The Decoder. The Hugging Face model card frames the model as offering “best-in-class agentic capabilities with native function calling and JSON output.” The Decoder, less effusive, notes that the model “trails Claude badly in banking scenarios.”
Medium 3.5 is multimodal on input. The Hugging Face card states it “Accepts both text and image input, with text output,” and The Decoder reports that Mistral built a vision encoder from scratch to handle variable image sizes and aspect ratios. Language coverage is broad: the Hugging Face card lists support for “dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic.”
For users on Mistral’s hosted API, The Decoder reports pricing of $1.50 per million input tokens and $7.50 per million output tokens.
InfoQ covered the release alongside Mistral’s broader Le Chat update, framing Medium 3.5 as “a 128-billion parameter model designed to handle instruction following, reasoning, and coding within a single system” and describing the weights as available “in public preview” under a modified MIT license. InfoQ confirms the 256k context window, the small-GPU self-hosting story, and the bundled vision encoder trained to handle variable image inputs. The InfoQ piece notes that Medium 3.5 is now the default model for Mistral’s new remote coding agents in Vibe, replacing earlier models in the Vibe CLI and powering a workflow that lets users move sessions from local execution to the cloud while preserving state and history.
What We Don’t Know
Mistral has not published scores on the standard general-knowledge and reasoning benchmarks that other frontier labs report at launch. Neither The Decoder nor InfoQ cite GPQA Diamond, MMLU-Pro, AIME, HumanEval, MATH, or LiveCodeBench numbers for Medium 3.5; Mistral itself only highlights SWE-Bench Verified and τ³-Telecom on the Hugging Face card. The full benchmark profile relative to current open-weight competitors remains unverified externally.
The practical reach of the Modified MIT License’s revenue-based carve-out is also unspecified in our cited sources: the Hugging Face card refers to “exceptions for companies with large revenue” without quantifying the threshold or the trigger conditions in language quoted by The Decoder or InfoQ. Companies considering production deployment will need to inspect the LICENSE text on Hugging Face directly.
Analysis
The consolidation is the story. Through 2025 and into 2026, Mistral — like several rival labs — had been shipping a separate model for each job: a base instruct model, a reasoning model with chain-of-thought scaffolding, and a coding-specialist model trained to follow agentic prompts. As The Decoder reports, Medium 3.5 collapses that lineup: it replaces Medium 3.1 and the Magistral reasoning model in Le Chat, plus Devstral 2 in the Vibe CLI. Customers no longer have to choose; the model itself decides how much test-time compute to spend based on the reasoning_effort setting in the request.
The choice of a dense 128B architecture, in a market currently dominated by sparse MoE designs that activate only a fraction of their parameters per token, is a deliberate bet on deployability. As The Decoder emphasizes, all 128 billion parameters fire on every token, which is computationally heavier than a sparse alternative of the same nominal size, but the four-GPU self-hosting target reported by The Decoder and the small-GPU footprint described by InfoQ suggest Mistral is courting enterprises that want to run a frontier-class model in their own data center without the operational complexity of expert routing.
Finally, the timing of the Vibe remote-agent push reframes Medium 3.5 as more than a model release. InfoQ notes that the new remote coding agents are powered by Medium 3.5 and let users move sessions from local execution to the cloud while preserving state and history. That puts Mistral squarely in the same product space — long-running, asynchronous coding agents bound to a specific underlying model — as Anthropic’s Claude-powered Codex and OpenAI’s agent offerings, but with the open-weight option that those competitors do not provide.