Intelligent Tech

Deepmind’s V2A Creates Video Soundtracks and Dialogue

by Inside Telecom Staff
June 20, 2024

Reading time: 2 min

DeepMind, Google’s AI research lab is developing a new technology that generates soundtracks for video to audio.

In a recent blog post, DeepMind stated that the video to audio (V2A) technology is a crucial component in the field of AI-generated media.

Although many companies including DeepMind have created similar video-generated models, they lack the ability to produce synchronized sound effects.

“Video generation models are advancing at an incredible pace, but many current systems can only generate silent output,” DeepMind writes, “V2A technology [could] become a promising approach for bringing generated movies to life.”

How Does V2A Work?

V2A technology operates by using a description of a soundtrack, along with a video in order for it to create the appropriate matching music, sound effects, and dialogue. This process is enhanced with the use of SynthID technology developed by the AI research company to fight deepfakes.

According to the company, the AI model behind V2A is a diffusion model trained on different sounds, dialogue transcripts, as well as video clips.

DeepMind did not disclose any information on whether the training data was copyrighted or if the creators gave their consent.

This technology is not something new in the market, as recently Stability AI released a similar tool. There are also tools that create sound effects. Yet, what makes V2A different is that it has the ability to understand raw video pixels and automatically sync generated sounds with the video, even without descriptions.

Between Innovation and Responsibility

Despite its potential, the company acknowledges that this new technology has limitations. The model struggles with the video to audio that contain artifacts and twists, leading to a lower quality in audio. Additionally, the generated sounds can sometimes be conventional and unconvincing.

Due to these limitations, and to avoid its misuse, DeepMind is not planning to release the V2A anytime soon. It is now gathering feedback from creators and filmmakers to further enhance it, stating that it will also undergo safety assessments and testing.

Indeed, the AI research company believes that V2A will be mostly useful for the archivists, as well as those who work with historical footage. However, and in a remarkable move, it admits also that such AI tools have the potential to take over the film and TV industry, threatening the livelihood of employees. Thus, the company will make sure to take strong work protections to ensure that such tools do not replace or even eliminate some professions.

Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.

Tags: Artificial Intelligence China Google Google AI Google Deepmind Inside Telecom News Technology U.S.

Connected Germany returns to Munich for 2024

The path to 100G

Atlys Launches in the UAE, Simplifying Visa Applications for Travelers

Who Is Trying to Ruin the Reputation of Comium, Monty Mobile, and Mountasser Hachem?

Five innovators to establish energy efficient Open RAN

Enea, Zain KSA to Test New Network Security Solutions

It’s 2017 All Over Again, as FCC Debates Net Neutrality

Marine Cable Project, 2Africa, May Go Live in 2024

US Allocates $42 billion for Universal Access to High-Speed Broadband by 2030

Ericsson to Uplift India’s 5G

Ghostpulse Continues to Evolve, Creating PNG Malware Threats

Honeywell, Google Team Up for AI in Industrial Automation

Microsoft to Launch Autonomous AI Agents, Facing Salesforce Challenge

Electronic AI Tongue Advances Food Authenticity Testing

OpenAI, Carahsoft Partner for AI in Defense Industry Contracts

MyMonty: The New Era of Banking

Entering the Monty Multiverse at Seamless 2023

Seamless Dubai 2023 - From Concept to Reality: Shaffra Technologies Opens Doors to Metaverse Mastery

Take A Look in the Mirror. The Greatest Technology of All Will Stare Back at You

Monty Mobile Enters Multibillion-Dollar MNO Equipment Industry

US DoT’s New Safety Plan Introduces Car Communication

Little Girl Receives First Prosthetic Eye from MRI, CT Scans

DeepL’s AI Translation Software to Get Traditional Chinese

AI Now Knows You’re Drunk Behind the Wheel

Instagram Supports Researchers for Mental Health

Deepmind’s V2A Creates Video Soundtracks and Dialogue

Between Innovation and Responsibility