
A speech at an international diplomatic conference was used to train a machine learning translation system.Credit: Janek Skarzynski/AFP/Getty
The dream of the Babel Fish, the translation animal envisioned in the classic science fiction series The Hitchhiker’s Guide to the Galaxy, may be a little closer to reality. Researchers at tech giant Meta have developed a machine learning system that almost instantly translates speech in 101 languages into words spoken by a speech synthesizer in any of 36 target languages.
The Large Scale Multilingual and Multimodal Machine Translation (SEAMLESSM4T) system is also capable of speech-to-text, text-to-speech, and text-to-text translation. The results are published in the January 15 issue of Nature1.
Following the successful release of SEAMLESSM4T, Meta, a Menlo Park, Calif.-based company that operates social media sites such as Facebook, WhatsApp, and Instagram, is now making SEAMLESSM4T available for other researchers who want to build on it. announced that it would be made available as open source. We bring LLaMA’s large-scale language models to developers around the world.
Lack of data
Machine translation has come a long way in the past few decades, thanks to the introduction of neural networks trained on large datasets. While training data is abundant for major languages, especially English, many other languages are notoriously lacking in training data. This inequality limits the range of languages that machines can be trained to translate. “This has implications for languages that appear less frequently on the Internet,” Alison Konecke, a computer scientist at Cornell University in Ithaca, New York, wrote in a News & Views article accompanying the paper. .
Robowriter: The rise and risks of language-generating AI
Meta’s team built on previous work on speech-to-speech translation 2 and a project called No Language Left Behind 3, which aims to provide text-to-text translation for nearly 200 languages. . Through experience, researchers such as Meta have discovered that making a translation system multilingual can improve performance even when translating languages with limited training data. It’s unclear why this happens.
The research team collected millions of hours of audio files of speeches, as well as human translations of those speeches, from the Internet and other sources such as United Nations archives. The authors also collected transcripts of some of these speeches.
The team also trained a model using authoritative data to identify two pieces of matching content. This allowed the researchers to combine approximately 500,000 hours of audio and text and automatically match each fragment in one language with the corresponding fragment in another language.