Google Deep Mind announced On Monday (Jan. 6), the company announced it was creating a new team to work on “large-scale” generative models that “simulate the world.” These models represent the next stage in the advancement of artificial intelligence (AI) capabilities in decision-making and planning. and creativity.
A world model is a computational framework that helps AI systems understand and simulate the real or virtual world. They are the key to help It teaches AI systems to navigate environments and has wide applications in robotics, gaming, and autonomous systems.
For example, self-driving cars use these world models to simulate traffic and road conditions. You can also train generalist AI robots in different environments. A common problem is the lack of a rich, diverse, and safe training environment. so-called Embodied AI.
DeepMind’s Monday job posting notes that scaling AI models is also important to the technology’s evolution.
“We believe that scaling video and multimodal data pre-training is on the critical path to artificial general intelligence. World models can be integrated into visual inference and simulation, embodied agent planning, “It will enhance many areas, including real-time interactive entertainment,” the job posting states. PYMNTS reached out to Google but has not yet received a response.
Tim Brooks, who left OpenAI to join Google DeepMind in October, will lead the team. At OpenAI, Brooks co-led the development of Sora. Sora’s video generation model went viral upon its release due to its sophistication.
According to work list for the teamthe new hire will “collaborate and build on” the work of Google’s flagship large-scale multimodal models: the Gemini, Veo (video generation models), and Genie (world models) teams.
Google DeepMind’s focus on world models is what AI startup World Labs said when it came out of stealth last September. The startup is developing a large-scale global model. Led by Stanford University AI pioneer Fei-Fei Li, the startup also includes AI pioneer and Nobel Prize winner Jeffrey Hinton, Salesforce CEO Marc Benioff, and LinkedIn co-founder Reid Hoffman. , former Google chairman Eric Schmidt, Andreessen Horowitz, NEA, NVentures, and others. .
Google DeepMind has already developed several world models, including Genie and Genie 2. Genie 2 converts text and image This transforms the environment into a 3D world that reacts according to the user’s actions. (Genie only created 2D worlds).
Genie 2 is a powerful AI model that learns from large video datasets and uses a process that compresses video frames into a simpler, more meaningful representation through an autoencoder. These compressed frames are analyzed by a transformer model that predicts how the video will progress step by step, using methods similar to how text generation models like ChatGPT work.
Trained on large video datasets, Genie 2 can display object interactions, complex character animations, and physics, such as gravity and splash effects. and Behavior modeling of other agents. The worlds it creates can last up to a minute, but most are in the 10-20 second range.
Google DeepMind expands its focus on world models, further enhancing the capabilities of its AI systems as it competes with OpenAI, Meta, and Microsoft. and Amazon provides services to businesses.
The latest innovations join an already rich list of innovations, one of which was recently Nod to Nobel Prize CEO Demis Hassabis and John M. Jumper: AlphaFold2. It’s an AI model. predicted Characterize every known protein and solve a 50-year-old biochemical challenge.
In a paper published in October, researchers at Google DeepMind said They are using a large-scale language model called the Habermas Machine to act as an AI arbiter to help small groups in the UK find common ground on controversial issues such as Brexit and immigration. I trained. To this end, we have created a “group statement” that captures their common perspective.
