A team at Tencent’s Hunyuan lab has created a new AI, ‘Hunyuan Video-Foley,’ that finally brings lifelike audio to generated video. It’s designed to listen to videos and generate a high-quality soundtrack that’s perfectly in sync with the action on screen.
Ever watched an AI-generated video and felt like something was missing? The visuals might be stunning, but they often have an eerie silence that breaks the spell. In the film industry, the sound that fills that silence – the rustle of leaves, the clap of thunder, the clink of a glass – is called Foley art, and it’s a painstaking craft performed by experts.
Matching that level of detail is a huge challenge for AI. For years, automated systems have struggled to create believable sounds for videos.
How is Tencent solving the AI-generated audio for video problem?
One of the biggest reasons video-to-audio (V2A) models often fell short in the sound department was what the researchers call “modality imbalance”. Essentially, the AI was listening more to the text prompts it was given than it was watching the actual video.
For instance, if you gave a model a video of a busy beach with people walking and seagulls flying, but the text prompt only said “the sound of ocean waves,” you’d likely just get the sound of waves. The AI would completely ignore the footsteps in the sand and the calls of the birds, making the scene feel lifeless.
On top of that, the quality of the audio was often subpar, and there simply wasn’t enough high-quality video with sound to train the models effectively.
Tencent’s Hunyuan team tackled these problems from three different angles:
The results speak sound for themselves
When Tencent tested Hunyuan Video-Foley against other leading AI models, the audio results were clear. It wasn’t just that the computer-based metrics were better; human listeners consistently rated its output as higher quality, better matched to the video, and more accurately timed.
Across the board, the AI delivered improvements in making the sound match the on-screen action, both in terms of content and timing. The results across multiple evaluation datasets support this:
Tencent’s work helps to close the gap between silent AI videos and an immersive viewing experience with quality audio. It’s bringing the magic of Foley art to the world of automated content creation, which could be a powerful capability for filmmakers, animators, and creators everywhere.
See also: Google Vids gets AI avatars and image-to-video tools

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.
AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.