查看原文
其他

扎克伯格亲自演示英语闽南语AI翻译

LearnAndRecord 2022-11-03

近日,Meta首席执行官马克·扎克伯格(Mark Zuckerberg)宣布,Meta开发了第一个专为无文字语言(如闽南语)建立的人工智能语音翻译系统,并亲自演示了闽南语和英语之间的实时互译。

🤔️小作业:

1. daunting是什么意思?
2. How many languages like Hokkien are there in the world, without a standard or widely known writing system?
3. For speakers of unwritten languages, what is the challenge of communicating with speakers of a different language?

无注释原文:


AI translates Hokkien, an unwritten language, for the first time


From: Facebook


Peng-Jen Chen is well aware of how language barriers can affect people’s ability to communicate.


Chen grew up in Taiwan speaking Mandarin, but his father, Sheng-Jiang Chen, a 70-year-old retired factory lead technician, hails from Southern Taiwan, where Hokkien is widely spoken. Though the two languages are related, they’re different enough that Chen’s father sometimes finds it tricky to conduct complex conversations in Mandarin. “I have always wished my father could communicate with everyone in Hokkien, which is the language he’s most comfortable speaking,” said Chen, a Meta AI researcher. “He understands Mandarin well but speaks more slowly when communicating about complex topics.”


But rather than simply worrying, Chen is doing something about the problem — he’s leading the development of new technology to translate between Hokkien and English.


This is a daunting task, because while languages like Mandarin, English, and Spanish are both written and spoken, Hokkien — which is widely spoken within the Chinese diaspora — is primarily oral. In fact, Chen and his team of researchers are among the first to use artificial intelligence (AI) to construct a translation system for languages like Hokkien that lack a formal or widely known writing system. While the initial stage of the project translates between English and Hokkien, researchers plan to allow the translation of more unwritten languages. It’s part of Meta’s ongoing effort to develop a Universal Speech Translator that will allow the translation of many languages in real time and could eventually help millions of people around the world like Chen’s father become more effective communicators.


“The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes,” said Meta Founder and CEO Mark Zuckerberg in an online presentation earlier this year.


Using computers to translate languages isn’t a new concept, but previous efforts have focused on written languages. Yet of the 7,000-plus living languages, over 40 percent are primarily oral and do not have a standard or widely known writing system like Hokkien.


AI translation


Building an AI speech translation system for Hokkien was no easy task. These tools are usually trained on large quantities of text. But for Hokkien, there is no widely known standard writing system. Furthermore, Hokkien is what’s known as an underresourced language, which means there isn’t much paired speech data available in comparison with, say, Spanish or English. Also, with few human English-to-Hokkien translators, it was difficult to collect and annotate data to train the model. 


To get around these problems, Meta researchers used text written in Mandarin, which is similar to Hokkien. The team also worked closely with Hokkien speakers to ensure that the translations were correct. “Our team first translated English or Hokkien speech to Mandarin text, and then translated it to Hokkien or English — both with human annotators and automatically,” said Meta researcher Juan Pino. “They then added the paired sentences to the data used to train the AI model.”


The researchers will make their model, code, and benchmark data freely available to allow others to build on their work. While the model is still a work in progress and can currently translate only one full sentence at a time, it’s a step toward a future where simultaneous translation between many languages is possible. 


Challenges of communication


Speakers of unwritten languages often face hurdles when trying to participate in online communities, said Laura Brown, a Meta researcher and linguistic anthropologist. Many of these speakers are not able to easily communicate in the digital realm because they are not used to writing in their language. 


“It can be a barrier to confidence, fluency, and authenticity,” Brown said. “We know at Meta that there are tons of people all over the world who have their interface set to English, who use English on our platforms — even though they are much more confident in other languages and writing systems. As soon as we give them the ability to do audio in their own language, their comfort and confidence in the digital space shoot way up.”


Communicating with speakers of a different language can be challenging for speakers of unwritten languages. It can be hard to recognize the units of sound in an unwritten language when it’s transcribed in a way meant to be understood as it’s heard. This complication often makes it harder to teach unwritten languages and can result in younger generations losing the ability to communicate in the language of their parents. 


Some languages without a standardized written form are at risk of dying out. Linguists are trying to preserve languages with a dwindling number of speakers by writing the languages down, but that can be challenging when they don’t have a conventional written form. Mexico’s National Institute of Indigenous Languages is one institution that is working to preserve the unwritten languages of Indigenous peoples by recording the vocabulary. 


The many possibilities of AI translation


Meta researchers believe AI could help solve many communication challenges for speakers of unwritten languages. Pino said that the new translation system could eventually make it easier to navigate the internet and communicate in different languages, whether virtually or in real life. 


For Chen, though, the goal of the new Hokkien translation system is more personal. “I just want my father to be able to speak to whomever he wants,” he said.


- ◆ -


注:中文文本为机器翻译仅供参考,并非一一对应

含注释全文:


AI translates Hokkien, an unwritten language, for the first time


From: Facebook


Peng-Jen Chen is well aware of how language barriers can affect people’s ability to communicate.


陈鹏仁(Peng-Jen Chen)非常清楚语言障碍将如何影响人们的交流能力。



Hokkien


The Hokkien (/ˈhɒkiɛn/) variety of Chinese is a Southern Min language native to and originating from the Minnan region, where it is widely spoken in the south-eastern part of Fujian.


据百度百科,闽南语,据传起源于黄河、洛水流域,在西晋时期、唐朝、北宋迁移至福建南部,发祥于福建泉州。现主要分布地除闽南地区和台湾地区外,还分布于闽东北地区、浙东南区、及广东潮汕地区(揭阳、汕头、潮州) 、海陆丰地区、粤西地区(湛江、茂名、阳江)、粤港澳大湾区(中山、香港)、海南岛及东南亚的大部分华人社群。全世界使用闽南语的有7000多万人。



Chen grew up in Taiwan speaking Mandarin, but his father, Sheng-Jiang Chen, a 70-year-old retired factory lead technician, hails from Southern Taiwan, where Hokkien is widely spoken. Though the two languages are related, they’re different enough that Chen’s father sometimes finds it tricky to conduct complex conversations in Mandarin. “I have always wished my father could communicate with everyone in Hokkien, which is the language he’s most comfortable speaking,” said Chen, a Meta AI researcher. “He understands Mandarin well but speaks more slowly when communicating about complex topics.”


陈鹏仁在中国台湾省长大,说普通话,但他的父亲陈胜江(Sheng-Jiang Chen)来自台湾省南部,70岁的陈胜江是一名退休的工厂首席技术员,那里广泛使用闽南语。尽管这两种语言是相关的,但它们的差异非常大,以至于陈鹏仁的父亲有时会发现用普通话进行复杂的对话很棘手。“我一直希望我父亲能用闽南语和每个人交流,这是他最喜欢说的语言,”Meta人工智能研究员陈鹏仁说。“他很懂普通话,但在交流复杂的话题时说得更慢。”



Mandarin


表示“(中国的)官话,普通话,国语”,英文解释为“a Chinese language that is the official language of China, and an official language of Singapore”



hail from somewhere


表示“来自;出生于”,英文解释为“to come from or have been born in a particular place”例如:

His father hailed from Italy.

他父亲出生于意大利。



tricky


表示“难办的;难对付的”,英文解释为“If a piece of work or problem is tricky, it is difficult to deal with and needs careful attention or skill.”举个🌰:

I'm in a tricky situation - whatever I do I'll offend someone.

我的处境真有点儿难办——我无论怎么做都会得罪人。



But rather than simply worrying, Chen is doing something about the problem — he’s leading the development of new technology to translate between Hokkien and English.


但是陈鹏仁不仅仅是担心,他还在为这个问题付出着——他正在引领闽南话和英语互译新技术的发展。


This is a daunting task, because while languages like Mandarin, English, and Spanish are both written and spoken, Hokkien — which is widely spoken within the Chinese diaspora — is primarily oral. In fact, Chen and his team of researchers are among the first to use artificial intelligence (AI) to construct a translation system for languages like Hokkien that lack a formal or widely known writing system. While the initial stage of the project translates between English and Hokkien, researchers plan to allow the translation of more unwritten languages. It’s part of Meta’s ongoing effort to develop a Universal Speech Translator that will allow the translation of many languages in real time and could eventually help millions of people around the world like Chen’s father become more effective communicators.


这是一项艰巨的任务,因为虽然像普通话、英语和西班牙语这样的语言既有文字又有口语,但在海外华人中广泛使用的闽南语主要是口语。事实上,陈鹏仁和他的研究团队是第一批使用人工智能(AI)为缺乏正式或广为人知的书写系统的闽南语等语言构建翻译系统的研究人员之一。虽然该项目的初始阶段在英语和闽南语之间进行翻译,但研究人员计划将能够翻译更多的非书面语言。这是Meta正在努力开发的通用语音翻译器的一部分,该翻译器将允许实时翻译多种语言,并最终帮助世界各地数百万像陈鹏仁父亲这样的人成为更有效的沟通者。



daunting


daunting /ˈdɔːntɪŋ/ 表示“使人气馁的,吓人的;使人畏缩的;令人发怵的”,英文解释为“Something that is daunting makes you feel slightly afraid or worried about dealing with it.”举个🌰:

He and his wife Jane were faced with the daunting task of restoring the gardens to their former splendour.

他和他的妻子简当时面临着恢复花园昔日风采的艰巨任务。


📺英剧《唐顿庄园》(Downton Abbey)中的台词提到:and those standards can at first seem daunting. 这些规矩起初令人望而生畏。


📺美剧《绝命毒师》(Breaking Bad)中的台词提到:Just the idea of owning a car wash seems daunting, 收购洗车房的主意听起来不切实际。




diaspora


diaspora /daɪˈæs.pər.ə/ 表示“(一国人口向其他国家的)流散,大移居”,英文解释为“the spreading of people from one original country to other countries”



“The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes,” said Meta Founder and CEO Mark Zuckerberg in an online presentation earlier this year.


Meta创始人兼首席执行官马克·扎克伯格(Mark Zuckerberg)在今年早些时候的一次线上演讲中说:“用任何语言与任何人交流的能力——这是人们梦寐以求的超能力,人工智能将在我们的有生之年实现这一目标。”


Using computers to translate languages isn’t a new concept, but previous efforts have focused on written languages. Yet of the 7,000-plus living languages, over 40 percent are primarily oral and do not have a standard or widely known writing system like Hokkien.


使用计算机翻译语言并不是一个新概念,但以前的努力集中在有文字语言上。然而,在7000多种现存语言中,像闽南语这样主要是口语、没有这样标准或广为人知的书写系统的语言超过40%。


AI translation 人工智能翻译


Building an AI speech translation system for Hokkien was no easy task. These tools are usually trained on large quantities of text. But for Hokkien, there is no widely known standard writing system. Furthermore, Hokkien is what’s known as an underresourced language, which means there isn’t much paired speech data available in comparison with, say, Spanish or English. Also, with few human English-to-Hokkien translators, it was difficult to collect and annotate data to train the model. 


为闽南语构建人工智能语音翻译系统并非易事。这些工具通常是在大量文本上训练的。但是对于闽南语来说,并没有广为人知的标准书写系统。此外,闽南语是一种被称为资源不足的语言,这意味着与西班牙语或英语相比,没有太多配对的语音数据可用。此外,由于很少有人工英译闽南语的翻译人员,很难收集和标注数据来训练模型。 



annotate


表示“为…做注释,标注”,英文解释为“If you annotate written work or a diagram, you add notes to it, especially in order to explain it.”举个🌰:

Historians annotate, check and interpret the diary selections. 

历史学家们对这些日记选段进行注释、核对和阐释。



To get around these problems, Meta researchers used text written in Mandarin, which is similar to Hokkien. The team also worked closely with Hokkien speakers to ensure that the translations were correct. “Our team first translated English or Hokkien speech to Mandarin text, and then translated it to Hokkien or English — both with human annotators and automatically,” said Meta researcher Juan Pino. “They then added the paired sentences to the data used to train the AI model.”


为了解决这些问题, Meta的研究人员使用了与闽南语类似的普通话文本。该团队还与说闽南语的人密切合作,以确保翻译正确。Meta的研究人员胡安·皮诺(Juan Pino)说:“我们的团队首先将英语或闽南语的语音翻译成普通话文本,然后将其翻译成闽南语或英语——既有人工标注,也有自动的。然后,他们将配对的句子添加到用于训练人工智能模型的数据中。”


The researchers will make their model, code, and benchmark data freely available to allow others to build on their work. While the model is still a work in progress and can currently translate only one full sentence at a time, it’s a step toward a future where simultaneous translation between many languages is possible. 


研究人员将免费提供他们的模型、代码和基准数据,以允许其他人在他们的工作基础上进行构建。虽然该模型仍然是一个半成品,目前一次只能翻译一个完整的句子,但它是朝着实现多种语言同声翻译的未来迈出的一步。 



benchmark


表示“基准”,英文解释为“something which can be measured and used as a standard that other things can be compared with”。



simultaneous


simultaneous /ˌsɪm.əlˈteɪ.ni.əs/ 表示“同时的”,英文解释为“happening or being done at exactly the same time”举个🌰:

There were several simultaneous explosions in different cities.

几起爆炸在几个不同的城市同时发生。



Challenges of communication 沟通的挑战


Speakers of unwritten languages often face hurdles when trying to participate in online communities, said Laura Brown, a Meta researcher and linguistic anthropologist. Many of these speakers are not able to easily communicate in the digital realm because they are not used to writing in their language. 


Meta研究人员、语言人类学家劳拉·布朗(Laura Brown)说,说无文字语言的人在试图参与线上社区时经常面临障碍。这些人中的许多人无法在数字领域轻松交流,因为他们不习惯用自己的语言写作。 



hurdle

在文中作名词表示“难关;障碍”,英文解释为“a problem or difficulty that must be solved or dealt with before you can achieve sth.”

它还有另一个常见意思是“栏架,跨栏”,英文解释为each of a series of vertical frames that a person or horse jumps over in a race. 复数形式 hurdles 即表示“跨栏比赛”,如:the 400-metre hurdles 400米跨栏比赛。



linguistic


表示“语言的;语言学的”,英文解释为“connected with language or the scientific study of language”,如:linguistic and cultural barriers 语言和文化上的障碍。



anthropologist


anthropologist /ˌænθrəˈpɒːlədʒɪst/ 表示“人类学家”,英文解释为“a person who studies anthropology


📍anthropology /ˌænθrəˈpɒlədʒɪ/:the study of the human race, especially of its origins, development, customs and beliefs 人类学



realm


realm /rɛlm/ 1)表示“领域;场所”,英文解释为“an area of activity, interest, or knowledge”举个🌰:

At the end of the speech he seemed to be moving  into the realms of  fantasy.

讲话的最后,他似乎进入了虚幻的境地。


2)表示“王国”(a country ruled by a king or queen)


📍beyond the realm of possibility 表示“超出范围,不可能”(not possible),相反的说法:within the realm of possibility 意思就是“在可能的范围”(possible),举个🌰:

A successful outcome is not beyond the realms of possibility.

最后取得成功并非没有可能。


🎬电影《复仇者联盟2:奥创纪元》(Avengers: Age of Ultron)中的台词提到:In every realm, there's a reflection. 每个国度都有倒影。



“It can be a barrier to confidence, fluency, and authenticity,” Brown said. “We know at Meta that there are tons of people all over the world who have their interface set to English, who use English on our platforms — even though they are much more confident in other languages and writing systems. As soon as we give them the ability to do audio in their own language, their comfort and confidence in the digital space shoot way up.”


“这可能是自信度、流利性和真实性的障碍,”布朗说。“我们知道,在Meta,世界各地有成千上万的人将他们的界面设置为英语,他们在我们的平台上使用英语——尽管他们对其他语言和写作系统更有信心。一旦我们让他们能够用自己的语言制作音频,他们对数字空间的舒适感和自信度就会大幅提升。”



authenticity


authenticity /ˌɔː.θenˈtɪs.ə.ti/ 表示“确实性;真实性;可靠性”,英文解释为“the quality of being genuine or true.”举个🌰:

The authenticity of her story is beyond doubt.

她讲述的事情的真实性不容置疑。



interface


interface /ˈɪn.tə.feɪs/ 表示“接口;界面”,英文解释为“a connection between two pieces of electronic equipment, or between a person and a computer”举个🌰:

My computer has a network interface, which allows me to get to other computers.

我的计算机有网络接口,可以与其他计算机连在一起。



shoot up


shoot /ʃuːt/ 表示“迅速长大;急速增加;快速提高”,英文解释为“to grow in size, or increase in number or level, very quickly”举个🌰:

He has really shot up since I saw him last.

自我上次见到他以来,他长高了一大截。



way


way作副词,常与介词或副词连用(used with a preposition or an adverb),表示“很远;大量;过度,大幅(尤其用于强调时间或空间中的程度或距离)”,英文解释为“used to emphasize degree or separation, especially in space or time;very far; by a large amount”举个🌰:

She finished the race way ahead of the other runners.

她遥遥领先于其他选手跑到终点。

He spends way too much money on clothes.

他花太多的钱买衣服。



Communicating with speakers of a different language can be challenging for speakers of unwritten languages. It can be hard to recognize the units of sound in an unwritten language when it’s transcribed in a way meant to be understood as it’s heard. This complication often makes it harder to teach unwritten languages and can result in younger generations losing the ability to communicate in the language of their parents.


对于无文字语言的使用者来说,与讲不同语言的人进行交流可能很有挑战性。当一种无文字语言被转录成旨在让人听懂的方式时,可能很难识别其声音单位。这种复杂性往往使无文字语言的教学更加困难,并可能导致年轻一代失去用其父母的语言交流的能力。



transcribe


表示“转录(为另一种书写形式)”,英文解释为“to change a piece of writing or music into another form, for example into a different writing system or into music for different instruments”。



Some languages without a standardized written form are at risk of dying out. Linguists are trying to preserve languages with a dwindling number of speakers by writing the languages down, but that can be challenging when they don’t have a conventional written form. Mexico’s National Institute of Indigenous Languages is one institution that is working to preserve the unwritten languages of Indigenous peoples by recording the vocabulary. 


一些没有标准化书面形式的语言正面临着消亡的危险。语言学家正试图通过文字记录来保护使用人数不断减少的语言,但当这些语言没有常规的书面形式时,这可能是一个挑战。墨西哥的国家本土语言研究所是一个致力于通过记录词汇来保护本土人民无文字语言的机构。



standardize


standardize /ˈstæn.də.daɪz/ 表示“使标准化,使合乎标准”,英文解释为“to make things of the same type all have the same basic features”举个🌰:

We standardize parts such as rear-view mirrors, so that one type will fit any model of car we make.

我们对后视镜之类的零部件进行标准化生产,这样只要一种型号便能匹配我们制造的所有款式的汽车了。



die out


表示“逐渐消失;灭绝”,英文解释为“to become less common and finally stop existing”举个🌰:

Dinosaurs died out millions of years ago.

数百万年前恐龙就已经灭绝了。



preserve

表示“保护,维护;保留;保养”,英文解释为“to keep something as it is, especially in order to prevent it from decaying or being damaged or destroyed”如:to preserve the environment 保护环境。



dwindle


dwindle /ˈdwɪndəl/表示“逐渐减少,缩小,变小”(to gradually become less and less or smaller and smaller)举个🌰:

The factory's workforce has dwindled from over 1,000 to a few hundred.

该厂的工人总数已从1000多减少到了几百人。


表示降低,减少的词经常出现,比如:

📍fall表示“(水平、数量、价格等,尤指较大幅度地)下跌,下降,降低”(to go down to a lower level, amount, price etc, especially a much lower one)


📍slide表示“(价格等)下滑,下跌”(if prices, amounts, rates etc slide, they become lower)


📍diminish表示“(使)减少,(使)减小”(to become or make something become smaller or less)


📍dip表示“降低,减少”,英文解释为“if an amount or level dips, it becomes less, usually for just a short time”,如:Profits dipped slightly last year. 去年利润略有降低。



conventional

表示“传统的;常规的;普通的”,英文解释为“traditional and ordinary”如:conventional behaviour/attitudes/clothes 传统行为/态度/服装。



indigenous


indigenous /ɪnˈdɪdʒɪnəs/ 表示“土生土长的,本地的”,英文解释为“indigenous people or things have always been in the place where they are, rather than being brought there from somewhere else”。


🎬电影《阿凡达》(Avatar)中的台词提到:We have an indigenous population of humanoids called the Na'vi. 这里有一种长得像人的土著 我们称其为“纳威”。



The many possibilities of AI translation 人工智能翻译的多种可能性


Meta researchers believe AI could help solve many communication challenges for speakers of unwritten languages. Pino said that the new translation system could eventually make it easier to navigate the internet and communicate in different languages, whether virtually or in real life. 


Meta研究人员认为,人工智能可以帮助解决无文字语言使用者的许多交流挑战。皮诺说,无论是线上还是现实生活中,新的翻译系统最终能够让上网冲浪、用不同的语言交流更加容易。 



navigate


navigate /ˈnæv.ɪ.ɡeɪt/ 1)表示“浏览,访问(网站)”,英文解释为“to move around a website or computer screen, or between websites or screens”举个🌰:

Their website is fairly plain, but very easy to navigate.

他们的网站页面比较简单,但浏览起来很便捷。


2)表示“导航,确定…的方向”,英文解释为“to direct the way that a ship, aircraft, etc. will travel, or to find a direction across, along, or over an area of water or land, often by using a map”举个🌰:

There weren't any road signs to help us navigate through the maze of one-way streets.

没有任何路标可以指引我们穿过像迷宫似的单行街道。



For Chen, though, the goal of the new Hokkien translation system is more personal. “I just want my father to be able to speak to whomever he wants,” he said. 


然而,对陈鹏仁来说,新的闽南语翻译系统的目标更加个人化。“我只是希望我的父亲能够和他想说话的人说话,”他说。


- 今日盘点 -

Hokkien
Mandarin
hail from somewhere
tricky
daunting
diaspora
annotate
benchmark
simultaneous
hurdle
linguistic
anthropologist
realm
authenticity
interface
shoot up
way
transcribe
standardize
die out
preserve
dwindle
conventional
indigenous
navigate

公众号后台对话框里发送666

参与抽奖(10月22日0点开)

公众号后台对话框里发送:沙发

查看沙发计划,抢沙发拿奖励

公众号后台对话框里发送:打卡

参与每天持续行动打卡计划

公众号后台对话框里发送:

查看提问指南,不懂先查再问


- 推荐阅读 -

2022政府工作报告中英文对照注释版

为了这个合集,准备了整整1年9月

「LearnAndRecord」2021大盘点

写在七周年的话

- END -

LearnAndRecord

2015年2月8日

2022年10月22日

第2814天

每天持续行动学外语

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存