Google Launches Gemini 3.1 Flash Live, a Real-Time Voice Model That Detects Emotion and Powers Agents Across 200 Countries

Overview

Google on March 26 released Gemini 3.1 Flash Live, a real-time multimodal voice model designed for natural, low-latency audio conversations. The model is rolling out to Gemini Live on Android and iOS, to Search Live in more than 200 countries, and to developers via the Live API in Google AI Studio, according to 9to5Google.

Flash Live is the latest addition to Google’s Gemini 3.1 family, which also includes the Pro and Flash-Lite tiers. Where those models target text-heavy reasoning and cost-efficient batch processing respectively, Flash Live is purpose-built for real-time spoken interaction — a domain that has become central to how Google, OpenAI, and others are competing for consumer and enterprise adoption.

What We Know

Google describes Gemini 3.1 Flash Live as its “highest-quality audio and voice model yet,” supporting over 90 languages for real-time multimodal conversations, according to 9to5Google. The model introduces several notable improvements over its predecessor.

Acoustic awareness. Flash Live can detect changes in a user’s pitch and pace during a conversation, allowing it to recognize when a speaker sounds frustrated or confused and adjust its tone accordingly. The model is also better at filtering background noise, distinguishing relevant speech from environmental sounds such as traffic or television, as reported by 9to5Google.

Extended conversational context. The model can follow the thread of a conversation for twice as long as the previous version, enabling longer brainstorming sessions and multi-turn interactions without losing coherence, according to 9to5Google.

Improved natural flow. Flash Live delivers faster responses with fewer pauses and dynamically adjusts answer length and tone to match conversational context. It also handles user hesitations, stuttering, and mid-sentence interruptions more gracefully than prior models.

Agent tool use. Google says it has “significantly improved the model’s ability to trigger external tools and deliver information during live conversations,” according to 9to5Google. Enhanced instruction-following helps agents stay within operational guardrails even when conversations take unexpected turns — a key requirement for production deployments where voice-based agents need to call APIs, look up data, or execute workflows in real time.

Multilingual design. The model supports real-time multimodal conversations in a user’s preferred language without requiring a separate localized version, a design choice that simplifies deployment for global applications.

Availability and Access

Flash Live is available in preview through the Gemini Live API in Google AI Studio, according to Google’s announcement. Search Live, which uses the model for voice-based search interactions, is expanding to more than 200 countries and territories. The model is currently free during the preview period, though Google has not announced production pricing or confirmed when the preview designation will be lifted.

What We Don’t Know

Google has not disclosed the model’s parameter count, architecture details, or formal benchmark results. It is unclear how Flash Live’s latency compares quantitatively to OpenAI’s real-time voice capabilities in GPT-5.4 or to Anthropic’s planned voice features. Google also has not specified whether all 90-plus supported languages launch with feature parity or whether some markets will receive a subset of capabilities initially.

The production pricing structure remains undisclosed. Given that the Gemini 3.1 Flash-Lite tier — the family’s cost-optimized text model — is priced at $0.25 per million input tokens, as noted by SiliconANGLE, Flash Live’s eventual pricing will be closely watched as an indicator of how Google intends to monetize real-time voice inference at scale.

Analysis

Flash Live’s acoustic awareness capability — detecting emotional state through pitch and pace — represents a meaningful step beyond simple speech-to-text-to-speech pipelines. If the feature works reliably in production, it could differentiate Google’s voice agents from competitors that treat audio as a flat text-equivalent signal.

The tool-use improvements are arguably more consequential for enterprise adoption. Voice-based agents that can reliably trigger external actions during live conversations unlock use cases in customer support, field operations, and accessibility that text-only agents cannot address. Google’s emphasis on guardrail compliance during unexpected conversational turns suggests the company is targeting regulated industries where predictability matters as much as capability.

The global rollout to 200-plus countries via Search Live also signals Google’s intent to make voice AI a default interaction mode for search, not merely a premium feature. Combined with the model’s multilingual support, this positions Flash Live as infrastructure for Google’s broader push to embed AI throughout its consumer products.