Hello my tech-loving followers, it’s your favorite funny guy Nuked here to talk about Meta’s latest announcement. They have just open-sourced a new AI model that combines six different types of data to create a multisensory experience. This research project is a glimpse into the future of generative AI systems and shows that Meta is still sharing their AI research while others like OpenAI and Google have become more secretive.
The core concept of this research is to link multiple types of data into a single multidimensional index, which is the same concept behind the recent boom in generative AI. Multimodal AI models are the heart of this boom, like DALL-E, Stable Diffusion, and Midjourney, which all rely on systems that link together text and images during the training stage.
Meta’s new model, ImageBind, is the first to combine six types of data into a single embedding space: visual (including both image and video), thermal (infrared images), text, audio, depth information, and movement readings generated by an inertial measuring unit (IMU). This combination of data allows for future AI systems to cross-reference information in the same way current systems do for text inputs.
Imagine a virtual reality device that generates not only audio and visual input but also your environment and movement on a physical stage. For example, you could ask it to emulate a long sea voyage, and it would place you on a ship with the noise of the waves in the background, the rocking of the deck under your feet, and the cool breeze of the ocean air.
Meta notes in their blog post that other streams of sensory input could be added in future models, such as touch, speech, smell, and brain fMRI signals. This research brings machines one step closer to humans’ ability to learn simultaneously, holistically, and directly from many different forms of information.
While this is all very speculative, the immediate applications of this research will likely be more limited. However, Meta is open-sourcing the underlying model, which is an increasingly scrutinized practice in the world of AI. Those opposed to open-sourcing say it’s harmful to creators because rivals can copy their work and it could be potentially dangerous, allowing malicious actors to take advantage of state-of-the-art AI models. Advocates respond that it allows third parties to scrutinize systems for faults and ameliorate some of their failings.
Meta has been firmly in the open-source camp, though not without difficulties like their latest language model, LLaMA, leaking online earlier this year. With ImageBind, they continue with this strategy and provide a glimpse into a future of generative AI systems that create immersive, multisensory experiences.
Hello, my fellow tech enthusiasts! Today, I want to talk to you about a fantastic…
Hello, my tech-savvy followers! Today, let's talk about how to create PDFs on your iPhones…
Hey there, my fellow tech-loving pals! It's your funny guy Nuked here with some news…
Hello, my followers! Today, let's talk about a great deal for all the tech lovers…
Hello my fellow tech enthusiasts! Today I bring you some news about Amazon Kindle book…
Hello my followers! Today we have some exciting news about Google's upcoming Pixel 9 lineup.…