Alibaba's Qwen3.5 Arrives as the First Native Multimodal Open-Weight Model to Challenge Frontier Proprietary AI

Overview

Alibaba’s Qwen team released Qwen3.5 on February 16, 2026, marking what the company describes as its first natively multimodal, agent-focused large language model. The flagship 397-billion-parameter mixture-of-experts model activates only 17 billion parameters per forward pass, a design choice that keeps inference fast and affordable while keeping overall model capacity competitive with proprietary frontier systems. As reported by SiliconAngle, model weights for the three commercially licensed variants are freely available on Hugging Face under the Apache 2.0 license.

What We Know

Architecture and Scale

The headline Qwen3.5-397B-A17B model packs 397 billion total parameters but routes each token through only 17 billion of them during inference, according to VentureBeat. This mixture-of-experts approach — using 512 total experts with 10 routed experts and 1 shared expert activated per token — is the same architectural family popularized by DeepSeek-V3 and earlier Qwen generations, but Alibaba has extended it with a hybrid attention mechanism that combines standard quadratic attention heads with linear attention heads and gated delta networks to reduce memory pressure during both training and inference.

The model supports a context window of up to 262,144 tokens and handles text input across 201 languages and dialects, up from 82 in the prior generation. It also accepts image and video input natively — a key distinction from previous Qwen releases, where visual capabilities were added via a separate model branch rather than integrated from the start of training.

Performance

Alibaba benchmarked Qwen3.5-397B-A17B against GPT-5.2 and Claude 4.5 Opus across more than 30 tasks. The model outperformed both on IFBench, a benchmark for instruction-following fidelity, according to SiliconAngle. On reasoning and coding tasks it reported scores of 93.3 percent on AIME 2026, 83.6 on LiveCodeBench v6, and 76.4 percent on SWE-bench Verified for coding-agent evaluation — results that place the model in the same performance tier as the leading proprietary systems on those specific evaluations, though with mixed results across the broader benchmark set.

VentureBeat noted that the 397B-A17B model outperforms Alibaba’s own earlier trillion-parameter Qwen3-Max, demonstrating that the sparse-activation approach can surpass a larger dense predecessor.

Smaller Models and Local Deployment

Alongside the flagship, Alibaba released Qwen3.5-35B-A3B and Qwen3.5-122B-A10B as open-weight models licensed for commercial use, and Qwen3.5-27B as an additional variant. A separate small-model series — spanning 0.8B, 1.7B, 4B, and 9B parameter counts — targets local deployment on consumer hardware. VentureBeat reported that the 9B model produces output quality comparable to Claude Sonnet 4.5 while running on standard laptop hardware, a combination that Alibaba is positioning as a way for developers to run capable agents without cloud dependency.

Agentic Tool Use

All Qwen3.5 variants ship with built-in support for structured tool calling — the ability to invoke external APIs, query databases, and execute multi-step workflows without requiring additional fine-tuning. The model was trained from the outset to plan and execute sequences of actions across digital environments, which Alibaba says distinguishes it from chat-first predecessors retrofitted for agent use cases. The Qwen3.5 model card on Hugging Face details the tool-calling schema and lists benchmarks for function-calling accuracy.

Pricing and Access

API access to the flagship model through Alibaba Cloud is priced at $0.40 per million input tokens and $2.40 per million output tokens for international users, approximately one-eighteenth the cost of Gemini 3 Pro at equivalent capability levels, according to VentureBeat. The open-weight release means organizations can also deploy the model on their own infrastructure at no licensing cost.

What We Don’t Know

Alibaba has not released the full training data composition or compute budget for Qwen3.5, making independent assessment of its training efficiency difficult. Benchmark results are self-reported and cover only a subset of the evaluations used by third-party comparison platforms. The model’s behavior on safety-critical tasks and its susceptibility to jailbreaks or prompt injection have not been independently audited at the time of release.

It is also unclear whether the Apache 2.0 license carries any of the usage restrictions that some Chinese AI labs have attached to prior open-weight releases under similarly permissive names. Developers deploying Qwen3.5 at scale should verify the exact license terms against their jurisdictional requirements.

Analysis

Qwen3.5’s release extends a pattern that has defined the open-weight AI landscape since early 2025: Chinese labs releasing capable, well-documented models under permissive licenses at a pace that keeps frontier performance achievable outside the largest Western AI labs. The prior milestones in this progression — DeepSeek-V3, Zhipu AI’s GLM-5, and earlier Qwen generations — each forced a recalibration of assumptions about the compute cost required to reach a given capability level.

What distinguishes Qwen3.5 within this series is the native multimodal integration. Previous open-weight releases from Alibaba and its competitors typically handled text and vision through separate model families, requiring developers to route tasks to different systems. Training vision into the core model from the start — rather than attaching it as a separate module — simplifies deployment architectures for agentic applications that need to perceive and reason about images and video alongside text.

The combination of open weights, Apache 2.0 licensing, and a per-token price that is a fraction of comparable proprietary offerings is likely to accelerate enterprise adoption in regions where cloud sovereignty requirements make reliance on U.S.-hosted inference impractical. Whether Qwen3.5’s mixed benchmark profile translates to consistently superior real-world performance in complex agentic tasks remains to be demonstrated outside controlled evaluations.