Giant tech company Meta has rolled out an AI model, known as “SeamlessM4T”, which is capable of translating and transcribing over 100 languages across text and speech.
In Meta’s quest to develop an advanced AI tool, the tech company said that SeamlessM4T represents a significant breakthrough in the field of AI-powered speech-to-speech and speech-to-text.
Meta claims that the AI tool has remarkable abilities as it performs the entire translation task in one go, unlike other large translation models that divide translation across different systems.
Tekedia Mini-MBA edition 16 (Feb 10 – May 3, 2025) opens registrations; register today for early bird discounts.
Tekedia AI in Business Masterclass opens registrations here.
Join Tekedia Capital Syndicate and invest in Africa’s finest startups here.
Speaking on the rollout of SeamlessM4T, Meta wrote via a blog post,
“The world we live in has never been more interconnected, giving people access to more multilingual content than ever before. This also makes the ability to communicate and understand information in any language increasingly important. Today, we’re introducing SeamlessM4T, the first all-in-one multimodal and multilingual Al translation model that allows people to communicate effortlessly through speech and text across different languages.
“SeamlessM4T supports:
•Speech recognition for nearly 100 languages
•Speech-to-text translation for nearly 100 input and output languages
•Speech-to-speech translation, supporting nearly 100 input languages and 36 (including English) output languages
•Text-to-text translation for nearly 100 languages
•Text-to-speech translation, supporting nearly 100 input languages and 35 (including English) output languages
“In keeping with our approach to open science, we’re publicly releasing SeamlessM4T under a research license to allow researchers and developers to build on this work. We’re also releasing the metadata of SeamlessAlign, the biggest open multimodal translation dataset to date; totaling 270,000 hours of mined speech and text alignments.
“Building a universal language translator, like the fictional Babel Fish in The Hitchhiker’s Guide to the Galaxy, is challenging because existing speech-to-speech and speech-to-text systems only cover a small fraction of the world’s languages. But we believe the work we’re announcing today is a significant step forward in this journey.
“Compared to approaches using separate models, SeamlessM4T’s single system approach reduces errors and delays, increasing the efficiency and quality of the translation process. This enables people who speak different languages to communicate with each other more effectively”.
According to Meta, SeamlessM4T builds on advancements the company and others have made over the years in the quest to create a universal translator.
Last year, it released No Language Left Behind (NLLB), a text-to-text machine translation model that supports 200 languages and has since been integrated into Wikipedia as one of the translation providers.
Earlier this year, the company revealed the Massively Multilingual Speech, which provides speech recognition, language identification, and speech synthesis technology across more than 1,100 languages.
Notably, SeamlessM4T draws on findings from all of these projects to enable a multilingual and multimodal translation experience stemming from a single model, built across a wide range of spoken data sources with state-of-the-art results.
The tech giant argues that SeamlessM4T doesn’t produce an excessive amount of toxic text in its translations, a common error with various translation and generative Al text models.
However, in specific, languages like Bengali and Kyrgyz, the model generates more toxic translations related to socioeconomic status and culture. Generally, SeamlessM4T tends to exhibit more toxicity in translations dealing with sexual orientation and religion.
Meta disclosed that this is only the latest step in the company’s ongoing effort to build Al-powered technology that helps connect people across languages.
The tech giant further hinted that in the future, it wants to explore how this foundational model can enable new communication capabilities – ultimately bringing Meta closer to a world where everyone can be understood.