Meta’s new open source models speak more than 1,100 languages



summary
Summary

As part of the Massively Multilingual Speech project, Meta is releasing AI models that can convert spoken language to text and text to speech in 1,100 languages.

The new set of models is based on Meta’s wav2vec, as well as a curated dataset of examples for 1,100 languages ​​and another uncurated dataset for nearly 4,000 languages, including languages ​​spoken by only a few hundred people for which no speech technology yet exists, according to Meta.

The model can express itself in more than 1,000 languages ​​and identify more than 4,000 languages. According to Meta, MMS outperforms previous models by covering ten times more languages. You can get an overview of all available languages ​​here.

New Testament gets new use as AI dataset

A key component of MMS is the Bible, specifically the New Testament. The Meta dataset contains New Testament readings in more than 1,107 languages ​​with an average length of 32 hours.

ad

Meta used these recordings in combination with matching passages from the Internet. In addition, another 3,809 unlabeled audio files were used, also New Testament readings, but without additional language information.

Since 32 hours per language is not enough training material for a reliable speech recognition system, Meta used wave2vec 2.0 to pre-train MMS models with more than 500,000 hours of speech in more than 1,400 languages. These models were then fine-tuned to understand or identify numerous languages.

Benchmarks show that the model’s performance remained nearly constant despite training with many more different languages. In fact, the error rate decreased minimally by 0.4 percentage points with increasing training.

Picture: Meta

According to Meta, it is also significantly lower than that of OpenAI’s Whisper, which was not explicitly optimized for extensive multilingualism. A comparison in English only would be more interesting. First testers on Twitter report that Whisper performs better here.

The fact that the voices in the dataset are predominantly male does not negatively affect the understanding or generation of female voices, according to Meta.

Recommendation

open-source Models on Github.



Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top