Resemble AI’s open-source model transforms noisy audio into crystal-clear speech



summary
Summary

Resemble Enhance is an open-source AI model that can significantly improve the quality of audio recordings.

The startup Resemble AI offers several AI tools for voice cloning, blending, and localization, as well as text-to-speech, speech-to-speech, and voice dubbing capabilities for various applications.

Now, the company has released Resemble Enhance, an AI model that converts noisy audio into clear speech. Unlike the company’s other models, Resemble Enhance is open source.

Resemble Enhance for podcasts and historical recordings

Resemble sees applications for the technology in areas such as podcasting, the general entertainment industry, and the restoration of historical audio documents. The company shows what this sounds like with an example of an old lecture.

Ad

Ad

Video: Resemble AI

The model consists of two main components: a denoiser and an enhancer. The denoiser uses a UNet model to separate speech from background noise to improve intelligibility. The enhancer uses a latent conditional flow matching (CFM) model to correct audio distortion and expand audio bandwidth.

The development team plans to continue improving Resemble Enhance, including optimizing processing times and extending control over individual speech elements to further improve audio quality. In the long run, the model should also be able to improve audio recordings that are more than 75 years old.

Resemble offers a demo of Resemble Enhance on HuggingFace. The code is available on GitHub.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top