Deepmind’s V2A Creates Video Soundtracks and Dialogue 

DeepMind, Google’s AI research lab is developing a new technology that generates soundtracks for video to audio.

DeepMind, Google’s AI research lab is developing a new technology that generates soundtracks for video to audio.

In a recent blog post, DeepMind stated that the video to audio (V2A) technology is a crucial component in the field of AI-generated media. 

Although many companies including DeepMind have created similar video-generated models, they lack the ability to produce synchronized sound effects. 

“Video generation models are advancing at an incredible pace, but many current systems can only generate silent output,” DeepMind writes, “V2A technology [could] become a promising approach for bringing generated movies to life.” 

How Does V2A Work? 

V2A technology operates by using a description of a soundtrack, along with a video in order for it to create the appropriate matching music, sound effects, and dialogue. This process is enhanced with the use of SynthID technology developed by the AI research company to fight deepfakes. 

According to the company, the AI model behind V2A is a diffusion model trained on different sounds, dialogue transcripts, as well as video clips. 

DeepMind did not disclose any information on whether the training data was copyrighted or if the creators gave their consent. 

This technology is not something new in the market, as recently Stability AI released a similar tool. There are also tools that create sound effects. Yet, what makes V2A different is that it has the ability to understand raw video pixels and automatically sync generated sounds with the video, even without descriptions. 

Between Innovation and Responsibility  

Despite its potential, the company acknowledges that this new technology has limitations. The model struggles with the video to audio that contain artifacts and twists, leading to a lower quality in audio. Additionally, the generated sounds can sometimes be conventional and unconvincing. 

Due to these limitations, and to avoid its misuse, DeepMind is not planning to release the V2A anytime soon. It is now gathering feedback from creators and filmmakers to further enhance it, stating that it will also undergo safety assessments and testing. 

Indeed, the AI research company believes that V2A will be mostly useful for the archivists, as well as those who work with historical footage. However, and in a remarkable move, it admits also that such AI tools have the potential to take over the film and TV industry, threatening the livelihood of employees. Thus, the company will make sure to take strong work protections to ensure that such tools do not replace or even eliminate some professions. 


Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.