FAIR has posted the code of XLS-R, a self-learning model that performs speech recognition, to the public. XLS-R supports 128 languages and surpasses all previous multilingual models in benchmarks.
The goal pursued by the developers of XLS-R was to create a single model for speech recognition, its translation and language identification in most of the most popular languages at once.
The model is trained on more than 436,000 hours of publicly available speech recordings, which is almost 10 times more than the previous model FAIR XLSR-53. The training data was taken from various sources, such as court records and audiobooks, and includes 128 languages, which is two and a half times more than XLSR-53.
XLS-R contains more than 2 billion parameters. FAIR developers claim that the increase in parameters has led to a significant improvement in the model, since a larger number of parameters allows you to form a better representation of the language based on training data. They also found that teaching the model to all languages at once increases its effectiveness more than when teaching one language.