Google Launches Gemini 3.5 Flash at I/O 2026, Beating Its Own Pro Model on Agentic and Coding Benchmarks
Google's new efficiency flagship outperforms Gemini 3.1 Pro on most evals while running 4x faster and costing 40% less.
Signal
18 articles covering "ai-models"
Google's new efficiency flagship outperforms Gemini 3.1 Pro on most evals while running 4x faster and costing 40% less.
The four-year commitment — described as the largest deal of its kind between an AI company and a global philanthropy — targets health services for 4.6 billion people in low-income countries.
Mistral released Medium 3.5, a dense 128B open-weight model with a 256k context window that consolidates Medium 3.1, Magistral, and Devstral 2 under a Modified MIT license, with a per-query reasoning toggle.
DeepSeek's V4-Pro and V4-Flash arrive with 1M-token context, three reasoning modes, and pricing that undercuts frontier rivals by up to 8x.
OpenAI open-sources a 1.5B-parameter PII detection model under Apache 2.0, enabling local redaction of names, emails, and secrets without sending data to external servers.
OpenAI's GPT-5.5 arrives as a ground-up retrain with a 922K-token context window, 82.7% on Terminal-Bench 2.0, and two-tier pricing starting at $5/$30 per million tokens.
Moonshot released Kimi K2.6 under a modified MIT license, claiming parity with GPT-5.4 and Claude Opus 4.6 on coding benchmarks while orchestrating agent swarms that run for half a day unattended.
Opus 4.7 lands across Claude, Bedrock, Vertex AI, and Foundry with unchanged pricing, while Anthropic uses it to trial new cyber safeguards before any broader Mythos rollout.
OpenAI unveiled GPT-5.4-Cyber on April 14, a cyber-permissive variant with binary reverse engineering capabilities, limited to vetted defenders through an expanded Trusted Access for Cyber program.
Chinese AI lab Z.ai has released GLM-5.1 under the MIT license, a mixture-of-experts model that claims the top score on the SWE-Bench Pro coding benchmark while introducing agentic capabilities designed to sustain autonomous work sessions lasting up to eight hours.
Nvidia open-sources a 30B mixture-of-experts model that activates only 3B parameters yet matches DeepSeek's 671B model on olympiad-level math and coding benchmarks.
An anonymous trillion-parameter AI model that topped OpenRouter usage charts and triggered widespread speculation about a stealth DeepSeek V4 launch turned out to be Xiaomi's MiMo-V2-Pro, signaling the smartphone and EV maker's escalating ambitions in the AI model race.