The Future of AI: A Harmonious Blend of Gemini and Veo

Hello, tech enthusiasts! Today, we’re diving into an exciting development in the world of artificial intelligence.

In a recent chat on the podcast Possible, co-hosted by Reid Hoffman, the CEO of Google DeepMind, Demis Hassabis shared some groundbreaking insights. He announced that Google is gearing up to merge its Gemini AI models with the Veo video-generating systems.

This union aims to enhance Gemini’s grasp of the physical universe, combining different modalities into one cohesive system. Hassabis explained that they designed Gemini to be multimodal from the get-go, aspiring to create a universal digital assistant that seamlessly integrates into our everyday lives.

The AI sector is inching closer to the creation of ‘omni’ models—systems that can interpret and mix various media forms. The latest iteration of Gemini is already capable of generating audio, images, and text, aligning itself with the advancements from other tech giants like OpenAI and Amazon, who are also venturing into the realm of multimodal models.

However, building these sophisticated models isn’t a walk in the park. They require a treasure trove of training data, from videos and audio clips to text and images. Hassabis hinted that the invaluable video data for Veo primarily comes from YouTube, leveraging the platform’s vast content repository.

He noted that through extensive viewing of YouTube videos, Veo 2 is learning about the laws of physics and how they apply to our world. There’s a fascinating relationship between Google’s AI ambitions and the content created by YouTube creators, especially after Google updated its terms to harness more data for AI training.

Spread the AI news in the universe!