Miami Startup Subquadratic Emerges From Stealth With $29M and a 12-Million-Token Model It Says Beats Frontier Compute by 1,000x

Overview

A Miami-based startup called Subquadratic emerged from stealth on May 5, 2026 with a $29 million seed round and a large language model it claims breaks the quadratic compute scaling that has defined transformers since 2017. The company’s first model, SubQ 1M-Preview, runs on what Subquadratic calls a Sparse Subquadratic Attention architecture and exposes a 12-million-token context window — about ten times the longest windows in production from frontier labs — at a fraction of the compute, according to SiliconANGLE. The reception has been split: some researchers see a meaningful long-context advance, others have publicly accused the company of vaporware, according to VentureBeat.

What We Know

Subquadratic raised $29 million in seed funding from a list of investors led by former SoftBank Vision Fund partner Javier Villamizar and Tinder co-founder Justin Mateen, with participation from early backers of Anthropic, OpenAI, Stripe, and Brex, according to SiliconANGLE.
The startup is led by chief executive Justin Dangel and chief technology officer Alexander Whedon, according to SiliconANGLE.
SubQ 1M-Preview is built on a proprietary transformer architecture with sparse attention designed to scale linearly with context length rather than quadratically. The company describes the approach as one that “selectively focuses only on the token comparisons that matter rather than computing attention across every possible relationship,” according to VentureBeat.
The advertised context window is up to 12 million tokens, which Subquadratic equates to roughly nine million words or 120 books, according to SiliconANGLE.
At 12 million tokens, Subquadratic says SubQ reduces attention compute by almost 1,000 times compared with other frontier models, according to VentureBeat. At 1 million tokens, the company says it is more than 50 times faster and 50 times less expensive than leading frontier models, according to SiliconANGLE.
On the RULER 128K long-context benchmark, Subquadratic reports SubQ achieving 95 percent accuracy at a cost of about $8, compared with Claude Opus at 94 percent accuracy and roughly $2,600 — a roughly 300-fold cost reduction on the company’s own numbers, according to SiliconANGLE.
Subquadratic launched three products into private beta alongside the funding announcement: an API exposing the full context window, a CLI coding agent called SubQ Code that loads an entire codebase into context, and a search tool called SubQ Search that is initially free, according to SiliconANGLE.
The model is currently neither open source nor open weight; the company says it can be trained for customer-specific use cases, according to SiliconANGLE.
Asked about the central technical bet, Dangel said the team is “very focused on the problem of how we transition from a dense attention, quadratic scaling architecture to a sparse attention linear architecture,” according to SiliconANGLE.

What We Don’t Know

Subquadratic has not published peer-reviewed research on Sparse Subquadratic Attention and has not released model weights, leaving the central efficiency claim — the 1,000x reduction at 12 million tokens — without independent reproduction at the time of launch, according to VentureBeat.
Reaction from the broader AI research community has been mixed, ranging from “genuine curiosity to open accusations of vaporware,” according to VentureBeat. AI researcher John Rysana pushed back on the most aggressive framing, arguing the work is “just subquadratic attention done well which is very meaningful for long context workloads,” and that the “odds of it being BS are extremely low,” according to VentureBeat.
The company has not disclosed details on training data, training compute, or the specific test conditions used for its RULER 128K and other benchmark numbers beyond the headline figures it published, according to SiliconANGLE.

Analysis

If SubQ’s numbers hold up under outside replication, the implication is that long-context inference — today the most expensive regime in commercial LLM serving — could be brought into the price band of routine API calls. Frontier labs have spent the past year extending context windows from 200,000 to roughly one million tokens by combining ring-attention, blockwise computation, and aggressive caching, but the underlying quadratic cost of dense attention still bends pricing curves sharply upward beyond a few hundred thousand tokens. A genuinely linear-scaling architecture would not just reduce cost; it would change which workloads — full-codebase reasoning, long legal corpora, multi-document research — are economically viable to run on a per-query basis.

The gap between that scenario and what Subquadratic has actually disclosed is wide. The company is shipping a closed model with no public weights, no peer review, and benchmark numbers the company itself supplied. That is a familiar position for a stealth-emerging startup, but it is also the position from which extraordinary efficiency claims have repeatedly failed to survive contact with independent testing. The next data point that matters is not another company-issued chart but a third-party run of SubQ against its competitors at the full advertised context length.

Editor's Note · May 9, 2026

Overview

What We Know

What We Don’t Know

Analysis