Meta’s new AI-powered speech translation system for Hokkien translates live speech

Until now, AI translation has mainly focused on written languages. Yet nearly half of the world’s 7,000+ living languages are primarily oral and do not have a standard or widely used writing system. This makes it impossible to build machine translation tools using standard techniques, which require large amounts of written text in order to train an AI model. To address this challenge, we’ve built the first AI-powered translation system for a primarily oral language, Hokkien. Hokkien is widely spoken within the Chinese diaspora but lacks a standard written form. Our technology allows Hokkien speakers to hold conversations with English speakers.

The open sourced translation system is part of Meta’s Universal Speech Translator (UST) project, which is developing new AI methods that we hope will eventually allow real-time speech-to-speech translation across all extant languages, even primarily spoken ones. We believe spoken communication can help break down barriers and bring people together wherever they are located — even in the metaverse.

Collecting sufficient data was a significant obstacle we faced when setting out to build a Hokkien translation system. Hokkien is what’s known as a low-resource language, which means there isn’t an ample supply of training data readily available for the language, compared with, say, Spanish or English. In addition, there are relatively few human English-to-Hokkien translators, making it difficult to collect and annotate data to train the model.

We leveraged Mandarin as an intermediate language to build pseudolabel as well as human translations, where we first translated English (or Hokkien) speech to Mandarin text, and we then translated to Hokkien (or English) and added it to training data. This method greatly improved the model performance by leveraging data from a similar high-resource language.

1 Like