Google DeepMind Plans to Train AI on YouTube Videos for Real-World Understanding:

Google is accelerating its artificial intelligence ambitions by integrating its powerful AI tools — Gemini and Veo — to create a universal digital assistant capable of understanding the real world. According to DeepMind CEO Demis Hassabis, the company plans to train this next-gen AI model using a rich source of video content: YouTube.

In a recent episode of the Possible podcast, Hassabis detailed how Gemini, an AI model built for text, image, and audio processing, will be combined with Veo, Google’s video generation model. The aim is to build an assistant that doesn’t just provide answers but comprehends the world’s physical properties by observing real-life video content.

Why YouTube is Central to the Plan

Owned by Google, YouTube offers a vast collection of user-generated videos covering everything from cooking and sports to science and engineering. Hassabis emphasized that watching millions of such videos allows AI to learn not just data but context and cause-effect dynamics.

“Basically, by watching YouTube videos — a lot of YouTube videos — [Veo 2] can figure out the physics of the world,” Hassabis said.

This method enables AI to develop a human-like understanding of tasks, such as recognizing how food ingredients change during cooking or how tools function during construction.

A Multimodal Assistant for Real-World Use

The long-term vision is to build a multimodal AI assistant — one that can process and respond to a combination of text, visuals, and sound. Hassabis noted, “We’ve always built Gemini to be multimodal… because we have a vision for a universal digital assistant that actually helps you in the real world.”

Such an assistant could move beyond chat responses and static visuals to offer insights grounded in real-world observation and behavior.

Industry Competition and Ethical Considerations

Google’s efforts are part of a broader industry trend, with OpenAI and Amazon also developing “omnimodal” AI models that integrate multiple data types.

The process of training AI with YouTube content raises data privacy and copyright concerns. Google has acknowledged that some models are trained on YouTube videos, contingent on creator permissions. In line with this, Google updated its terms of service in 2023 to permit broader usage of content for AI development.