Google Debuts Gemini Omni at I/O 2026, an Any-to-Any Model That Simulates the World to Generate Physics-Aware Video

Overview

Google DeepMind unveiled Gemini Omni on May 19, 2026, at Google I/O — a new model family designed to accept any combination of text, images, audio, and video as input and produce video output through multi-turn conversation. Demis Hassabis, CEO of Google DeepMind, described it as “our new model that can create anything from any input,” adding that it represents “a world model AI that can understand and simulate the world.” The first model in the family, Gemini Omni Flash, is now live in the Gemini app for Google AI Plus, Pro, and Ultra subscribers globally, and is available at no cost through YouTube Shorts and YouTube Create.

What We Know

Architecture and Components

Gemini Omni is not a standalone model but a fusion of several previously separate Google DeepMind systems. According to Decrypt, Hassabis described it as combining “Gemini’s intelligence with the best of our generative media models for a new level of world understanding, multimodality, and editing.” The component models integrated into Omni include Veo (Google’s video generation engine), Genie (world simulation), and Nano Banana (image editing), according to 9to5Google.

Koray Kavukcuoglu, who is identified as CTO of Google DeepMind and Chief AI Architect at Google, wrote in the official Google Blog announcement that the model “combines an intuitive understanding of physics with Gemini’s knowledge of history, science and cultural context” and has an “improved intuitive understanding of forces like gravity, kinetic energy and fluid dynamics.”

Multimodal Input and Conversational Editing

The core differentiator Google is emphasizing is what it calls the “any-to-any” capability: users can “combine images, audio, video and text as input and generate high-quality videos,” according to the Google Blog. The model is designed to be edited through conversation — the Google Blog states: “Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before.”

Nicole Brichtova, Director of Product Management at DeepMind, told TechCrunch that Gemini Omni is “the next step towards the progression of combining the intelligence of Gemini with the rendering capabilities of our media models.”

Sundar Pichai, Google’s CEO, framed the announcement in broader terms at the keynote, saying that “with world models, AI is moving from predicting text to simulating reality,” according to TechCrunch.

Video Length and Constraints

The initial Gemini Omni Flash model renders clips of 10 seconds. TechCrunch notes this is described as a design choice rather than a model limitation, with Google focused on making the tool accessible to a broad consumer base in its first iteration. The Tech Portal reports that the platform supports short-form video generation that is expandable to longer formats.

Audio-based editing is available, though 9to5Google notes that some audio input modes are initially restricted pending a broader responsible rollout.

Safety and Transparency Measures

All videos generated by Omni carry what the Google Blog calls an “imperceptible SynthID digital watermark,” Google’s existing provenance tool for AI-generated media. The model also supports the creation of personal Avatars — a feature TechCrunch describes as digital representations of users built through a recorded onboarding process.

Flow Integration

Gemini Omni ships with an integration into Google’s AI creative studio Flow. According to Decrypt, Flow Agent serves as an AI assistant for brainstorming, asset organization, plot recommendations, and batch editing, while Flow Tools enables custom editing workflows via natural-language prompts without coding.

Availability

Gemini Omni Flash launched simultaneously on May 19 across the Gemini app, Google Flow, and YouTube Shorts. Access through the Gemini app requires a Google AI Plus, Pro, or Ultra subscription, according to the Google Blog. On YouTube Shorts and the YouTube Create app, it is available at no cost. A second model in the family, Omni Pro, has been teased by Google but no release timeline has been disclosed, according to TechCrunch and 9to5Google. Developer and enterprise API access is coming in the weeks following the launch, according to the Google Blog.

What We Don’t Know

Google has not disclosed pricing for the Gemini Omni API. The architecture details of the model — parameter count, training data composition, and how the component models (Veo, Genie, Nano Banana) are integrated at the weight level — have not been publicly released. The timeline for Omni Pro remains unspecified. Benchmark scores for Gemini Omni against other video-generation models have not been published. Independent verification of Google’s claims about physics simulation fidelity and multi-turn consistency has not yet taken place, as the model only launched on May 19.

Analysis

Gemini Omni represents Google’s clearest attempt to reframe the AI competition away from text-based reasoning and toward generative media. By positioning Omni as a “world model” — a term Hassabis used at the keynote and that Decrypt reported him describing as “a step towards artificial general intelligence” — Google is making a philosophical claim about what AI should ultimately do: not predict the next token but simulate the behavior of the world.

The architectural choice to integrate rather than replace — fusing Gemini’s reasoning with Veo’s rendering, Genie’s world simulation, and Nano Banana’s image editing — suggests Google is betting that the ceiling on video quality and consistency is set by how well a model can reason about what the world looks like, not how powerful its rendering pipeline is. Whether that bet pays off relative to dedicated video-generation competitors will depend on sustained user access and independent testing in the weeks ahead.

The simultaneous rollout to YouTube Shorts — one of the world’s largest short-form video platforms — at no cost gives Omni distribution that no other generative video model has had on day one, which may accelerate both adoption and scrutiny.