Genie 3
Genie 3 is DeepMind’s next-generation, general-purpose world model capable of generating richly interactive 3D environments in real time at 24 frames per second and 720p resolution that remain consistent for several minutes. Prompted by text input, the system constructs dynamic virtual worlds where users (or embodied agents) can navigate and interact with natural phenomena from multiple perspectives, like first-person or isometric. A standout feature is its emergent long-horizon visual memory: Genie 3 maintains environmental consistency over extended durations, preserving off-screen elements and spatial coherence across revisits. It also supports “promptable world events,” enabling users to modify scenes, such as changing weather or introducing new objects, on the fly. Designed to support embodied agent research, Genie 3 seamlessly integrates with agents like SIMA, facilitating goal-based navigation and complex task accomplishment.
Learn more
Kling 3.0 Omni
Kling 3.0 Omni model is a generative video system designed to create imaginative videos from text prompts, images, or reference materials using advanced multimodal AI technology. It allows users to generate continuous video clips with flexible durations ranging from approximately 3 to 15 seconds, enabling short cinematic scenes that respond closely to prompt instructions. It supports prompt-based video generation as well as reference-based workflows, where users provide images or other visual elements to guide the subject, style, or composition of the generated scene. It improves prompt adherence and subject consistency, allowing characters, objects, and environments to remain stable throughout the generated clip while maintaining realistic motion and visual coherence. The Omni model also enhances reference-based generation so that characters or elements introduced through images remain recognizable across frames.
Learn more
Gemini Omni Flash
Gemini Omni is Google’s new model family where Gemini’s ability to reason meets the ability to create, starting with video. The first model in the family, Gemini Omni Flash, can create anything from any input by combining images, audio, video, and text as input, then generating high-quality videos grounded in Gemini’s real-world knowledge. It gives users an easier way to edit video through conversation, where every instruction builds on the last, characters stay consistent, physics hold up, and the scene remembers what came before. Users can transform specific details or entire worlds, reimagine action, add new characters or objects, change environments, adjust camera angles, refine styles, and build multi-turn edits without losing the thread of the original scene. Gemini Omni is designed to bridge photorealism and meaningful storytelling by reasoning about what should happen next, using an intuitive understanding of forces like gravity, kinetic energy, and fluid dynamics.
Learn more
Odyssey-2 Pro
Odyssey-2 Pro is a frontier general-purpose world model that generates continuous, interactive simulations you can integrate into products via the Odyssey API, marking a pivotal moment for world models similar to GPT-2 in language. It’s trained on large amounts of video and interaction data to learn how the world evolves frame-by-frame and outputs minutes-long simulations that can be interacted with in real time, not fixed short clips. Odyssey-2 Pro delivers improved physics, richer dynamics, more authentic behaviors, and sharper visuals by streaming 720p video at up to ~22 FPS that responds instantly to prompts and actions, and it supports embedding interactive streams, viewable streams, and parameterized simulations into applications with simple SDKs in JavaScript and Python. Developers can integrate the model with under ten lines of code to create open-ended, interactive video experiences where users’ inputs shape evolving scenes.
Learn more