Tencent Open-Sources HY-World 2.0, the First Foundation Model to Output Game-Engine-Ready 3D Worlds Instead of Video

Overview

Tencent’s Hunyuan team open-sourced HY-World 2.0 on April 16, 2026, releasing model weights, inference code, and a technical report for what it describes as a multi-modal world model that turns text prompts, single images, multi-view images, or video into navigable 3D scenes. The release is documented on the project’s GitHub repository, which lists April 16 as the public announcement date and confirms that the WorldMirror 2.0 reconstruction module is available now while the panorama and world-expansion components will follow.

The model is being distributed under a custom “tencent-hy-world-2.0-community” license, with weights and inference code hosted on Hugging Face. Unlike most generative world models released in the past year, HY-World 2.0 outputs persistent 3D geometry — meshes, point clouds, and 3D Gaussian splats — rather than pixel-level video.

What We Know

HY-World 2.0 is structured as a four-stage pipeline. According to the GitHub repository, HY-Pano 2.0 first generates a 360-degree panorama from a text prompt or input image, WorldNav plans a camera trajectory through the resulting scene, WorldStereo 2.0 expands the panorama into a navigable 3D Gaussian splatting world along that trajectory, and WorldMirror 2.0 — a feed-forward model the repository describes as approximately 1.2 billion parameters — predicts depth, surface normals, camera parameters, point clouds, and 3DGS attributes in a single pass.

The Hugging Face model card states that WorldMirror 2.0’s release includes inference code, model checkpoints, and a Gradio web demo, with full world-generation code and the HY-Pano 2.0 and WorldStereo 2.0 weights listed as coming soon. The recommended runtime is CUDA 12.4, Python 3.10, and PyTorch 2.4.0, with multi-GPU support via FSDP and BF16 enabled. The Hugging Face card also notes that WorldMirror 2.0 supports flexible-resolution inference between 50,000 and 500,000 pixels.

The central design choice — generating editable 3D assets rather than video — is what sets HY-World 2.0 apart from most prior world-model releases. The GitHub README frames this as a paradigm shift, contrasting persistent 3D outputs with the ephemeral video clips produced by earlier systems and stating that the resulting assets can be imported into Blender, Unity, Unreal Engine, and NVIDIA Isaac Sim.

Tencent’s broader strategic push into world models predates this release. In a South China Morning Post interview, Guo Chunchao, a principal research scientist at Tencent who heads 3D generation and world modelling at Hunyuan, argued that world models offer “true visual and spatial intelligence” beyond the language-centric intelligence of large language models. SCMP reported that Tencent positions world models as a way to enable AI agents and robots to learn complex tasks safely inside their own simulations rather than through real-world trial and error.

The release builds on Tencent’s earlier work in 3D generation. As VentureBeat reported, the company previously open-sourced Hunyuan3D 2.0, an asset-generation system that turns single images or text descriptions into 3D models in seconds and was published on Hugging Face and GitHub. HY-World 2.0 extends that lineage from individual asset generation to entire navigable scenes.

What We Don’t Know

At the time of release, only WorldMirror 2.0 is fully open-sourced; HY-Pano 2.0 and WorldStereo 2.0 weights are still listed as forthcoming on the GitHub repository. That means independent reproduction of the full text-to-3D-world pipeline is not yet possible from the released artifacts alone. The repository does not specify a date for those follow-on releases.

The tencent-hy-world-2.0-community license referenced on the model card is a custom license rather than a standard permissive one such as Apache 2.0 or MIT. Its commercial-use terms, redistribution restrictions, and acceptable-use clauses will determine how broadly the model can be deployed in commercial games, simulators, or robotics products — and those details require careful reading of the license text rather than the headline “open source” framing.

Independent benchmark verification of the model’s quality against closed-source competitors is also limited. The GitHub repository reports internal metrics for WorldMirror 2.0’s camera control and point-cloud reconstruction accuracy on the DTU dataset, but third-party comparisons against systems such as World Labs’ Marble or Google’s Genie 3 — both referenced as competitors in SCMP’s coverage of the world-model race — have not yet been published in peer-reviewed form.

Finally, the practical hardware barrier remains nontrivial. The Hugging Face model card recommends a high-memory accelerator such as an A100 or H100 for inference, which limits hobbyist and small-studio adoption even though the weights themselves are downloadable.

Analysis

HY-World 2.0 lands in the middle of a rapidly intensifying race over what the AI industry now calls spatial intelligence — the ability of models to reason about 3D geometry and physical interaction rather than purely about text. As SCMP reported, Tencent’s own framing places this work in direct competition with Google DeepMind, xAI, and Fei-Fei Li’s World Labs, whose Marble product is closed source.

By releasing weights and inference code rather than gating the model behind an API, Tencent is following the playbook that Chinese frontier labs have used aggressively in 2026, from DeepSeek V4 to Z.ai’s GLM-5.1 and Alibaba’s Qwen 3.6 series. Where those releases targeted text and code, HY-World 2.0 stakes out a different axis of competition: spatial generation. The choice to output Unity- and Unreal-compatible 3D assets rather than video clips is a direct bid for the developer and game-studio audience that World Labs has so far courted with proprietary tools.

Whether the release accelerates downstream applications — game development, robotics simulation, digital twins — will depend on the still-unreleased components of the pipeline and on the actual terms of the community license. For now, HY-World 2.0 establishes a reference point: an open-weights foundation model whose primary output format is geometry, not pixels.