Alibaba Unveils Qwen3.7-Max at Cloud Summit, a Long-Horizon Agent Model That Ran Autonomously for 35 Hours

Overview

Alibaba unveiled Qwen3.7-Max at its Cloud Summit in Hangzhou on May 20, positioning the model as its flagship for the agent era. The release follows the Qwen3.6 series by approximately one month and marks a continued shift in Alibaba’s strategy: the proprietary Max tier is kept closed while the companion Qwen3.7-Plus will be open-sourced, according to Decrypt.

At the event, Liu Weiguang, Alibaba Cloud’s senior vice president, framed the broader announcement as a bid to dominate the entire AI value chain. “What we’re building is China’s AI factory,” Liu told reporters at the summit, describing Alibaba as “the only AI and cloud company in the country operating ‘all five layers of the full AI stack’”, spanning chips, agentic cloud, AI models, model service platforms, and agentic applications.

The 35-Hour Autonomous Run

The headline claim for Qwen3.7-Max is endurance: in internal testing, the model demonstrated that it could sustain continuous autonomous execution on complex tasks for up to 35 hours without performance degradation, as reported by South China Morning Post. During that run, the model made more than 1,000 tool calls while optimizing a compute kernel, ultimately achieving a roughly 10x improvement in inference speed compared with the manufacturer’s reference implementation.

Zhou Jingren, Alibaba Cloud’s chief technology officer, said the model had “consistently ranked among the top tier on various benchmarks and outperformed all other AI models in China,” according to South China Morning Post.

Agent Capabilities

Qwen3.7-Max is described as “a robust foundation” for AI agents that “excels at agentic coding, complex reasoning and ‘long-horizon tasks’” — multi-step missions requiring continuous decision-making over extended periods. The model is aimed at applications such as coding and debugging, office workflow automation, and tasks that span hundreds or thousands of sequential steps.

Benchmark Standing

Preview versions of Qwen3.7-Max and the companion Qwen3.7-Plus were quietly posted to LM Arena before the summit. The Qwen3.7-Max-Preview ranked 13th globally in text capabilities, seventh in math, ninth in expert-level prompts, and ninth in software and IT, according to Decrypt’s review. That places Alibaba as the sixth-ranked AI lab globally in text and fifth in vision capabilities.

However, South China Morning Post noted that “the preview models still trail behind top US products, including Anthropic’s Claude, Google’s Gemini and OpenAI’s GPT models.”

Open-Source Strategy and Monetization Shift

With Qwen3.7, Alibaba is continuing the pattern it established with the 3.6 generation: “Plus gets open-sourced, Max stays proprietary,” as Decrypt put it. Decrypt also noted that “Alibaba killed the free tier of Qwen Code last month and has been moving its best models behind a paywall.”

Qwen3.7-Max will be available through Alibaba Cloud’s Model Studio platform. API access had not been broadly rolled out as of the summit announcement.

What We Don’t Know

Alibaba has not disclosed pricing for Qwen3.7-Max, nor the exact parameter count of either the Max or Plus models. The open-source release schedule for Qwen3.7-Plus had not been confirmed as of the summit. Standardized benchmark scores on widely used third-party evaluations had not been published at the time of the announcement, with performance claims resting primarily on Alibaba’s own internal testing and the Arena leaderboard previews.

Analysis

The Qwen3.7 launch lands at an awkward moment for Alibaba’s AI organization. Earlier this year, Reuters reported that Qwen division head Lin Junyang resigned — the third senior Qwen executive to depart in 2026 — prompting CEO Eddie Wu to set up a Foundation Model Task Force coordinated by Wu himself, group CTO Wu Zeming, and Alibaba Cloud CTO Zhou Jingren. Qwen3.7-Max’s successful debut signals that the team retained enough continuity to ship a competitive flagship despite the leadership disruption.

The strategic bet on long-horizon agent tasks reflects a broader industry shift: as single-turn chatbot performance converges across labs, differentiation is moving toward models that can sustain multi-hour autonomous workflows without human intervention. Alibaba’s 35-hour autonomous run, if reproducible under independent testing, would represent a meaningful gap over the multi-minute agent sessions that most frontier models currently support reliably.

The persistent gap with US frontier models on Arena rankings — despite placing as China’s top lab — reflects how rapidly the overall field has moved. South China Morning Post noted that Qwen3.7 arrives “a month after the company launched its previous flagship Qwen3.6 series,” underscoring the pace at which Alibaba is now iterating.

Editor's Note · May 21, 2026