机器翻译什么时候能像人工翻译一样好?
关于这个问题,观察君在国外版知乎“Quora”上看到一个有意思的回答,很有启发,分享在文末。原文是英文,由Yeekit机器翻译和观察君共同翻译。
PS. 文末有投票哦~
Q:When do researchers expect machine translation will be as good as human translation?
问:研究人员觉得机器翻译什么时候能跟人工翻译一样好?
A:Not in foreseeable future.
答:在可预见的未来不会。
First, look at this picture:
首先,看看这张图:
And now, witness what language does to pictures:
现在,让我们来看看语言对这张图片做了什么:
You see, there is a cognitive technique called abstraction that sits at the foundation of every act of applying a language. Or, in other words, whenever you use a language todescribe a situation, you put aside most of the detail about it.
你看,有一种叫做抽象的认知技术,它是你以任何方式使用语言的基础。或者换句话说,当你使用一种语言来描述一种情况时,你会忽略大部分的细节。
The moment you turned the original picture into the “a girl that got lost in the woods” phrase, you have effectively applied to it the most ruthless and crude image compressor that the word “ruthless” hasever had honour of standing for.
当你把原照片变成"一个迷失在森林里的女孩"时,你就有效地把"无情"这个词所代表的最无情、最粗鲁的图像压缩器应用到了这张照片上。
A language is a finite set of discrete and repeating units that is used to describe continuous and irregular information. Because of that, the infinite majority of information would beinevitably lost from the sole act of applying language to it.
语言是一组有限的离散和重复单位,用来描述连续和不规则的信息。正因为如此,如果仅将语言直接应用于信息,很多信息会被遗漏,这是不可避免的。
From that, the overall algorithm of machine translation would have to be something like this:
由此看来,机器翻译的整体算法应该是这样的:
1.Properly parse the meaning of the source passage (i.e., solve all parsing problems, such as ambiguity intrinsic to natural languages).
正确地解析原文的意思(例如,解决所有的解析问题,比如自然语言固有的歧义)。
2.Restore the original information from the parsed meaning.
从被解析的意思中恢复原始信息。
3.Encode the information with the target language by applying the right abstractions.
应用正确的抽象能力,用目标语言对信息进行编码。
The steps 1 and 3 are difficult enough: doing them right is the pinnacle of applied linguistics. But the real problem is the step 2.
第一步和第三步已经够难的了:把它们做好已经是应用语言学的巅峰。但真正的问题在于第二步。
I am not saying that it’s not doable: after all, human translators somehow guess the lost information, aren’t they? If a computer made ofproteins, carbs and lipids manages, so why a computer made of wires can’t? However, doing so is beyond linguistics: it’s an AI-complete task. In other words, it would take an intelligence very similar to human’s to guess the lost information in a human manner.
我并不是说这是不可能的:毕竟,人工译者自然而然地可以猜测到丢失的信息,不是吗?如果一台由蛋白质、碳水化合物和脂类组成的计算机能够处理,那么为什么一台由电线组成的计算机不能呢?然而,这样做超出了语言学的范畴:这是一个人工智能完备(强人工智能)的任务。换句话说,需要一种与人类非常相似的智能,才能用人类的方式来猜测到丢失的信息。
Unless this problem is solved, a machine could have flawless grammar, butevery translation made by it would be effectively referring to something other than what the source phrase was referring to.
如果这个问题无法解决,那么就算机器翻译可以完美无暇地从语法层面上把原文翻译出来,也可能偏离原文所指的内容,而指向其它内容。
Q: Why is describing the original information so important in translation? I thought that accurately conveying the meaning of the original phrase was the goal…
问:为什么描述原文信息在翻译中如此重要?我认为准确地传达原文的意思正是目标所在。
Would’ve been so. But the problemis that every language has its own abstractions — and consequently, loses different information when referring to the same phrase.
本来应该是这样的。但问题是,每种语言都有自己的抽象概念,因此,当涉及同一个短语时,会丢失不同的信息。
For example, whenever we talk about a female human, there’s always the info about her age. English language has a two-grade system of approximating a female human’s age (girl-woman).
例如,每当我们谈论一个女性,总是有关于她的年龄的信息。英语有一个近似对应女性年龄(女孩-女人)的两级系统。
However, Russian has four-grade system (девочка-девушка-женщина-бабушка), which means that you cannot translate the word “girl” into Russian exactly. So, to properly translate a phrase into Russian, onehas to do the same that the speaker of the source phrase has done: take her ageand abstract from it.
然而,俄语有四级系统(小女孩-女孩-女人-老奶奶),这意味着你不能准确地将"女孩"这个词翻译成俄语。所以,要把一个短语正确地翻译成俄语,你必须做和原文的作者做过的一样的事情:把她的年龄从中提取出来。
Now look at the two pictures above again. Can you guess the approximate age of the female human on the first picture? Good. Now, how about the second one?
现在再看一下上面的两张图片。你能猜出第一张图片中女性人类的大致年龄吗?很好。那第二张呢?
Thisis just one example.
以上只是一个例子。
划重点:
什么时候机器翻译能像人工翻译一样好呢?在可预见的未来不会,因为在机器翻译解码和编码过程中所丢失的信息,需要一种与人类非常相似的智能,用人类的方式来猜测和还原。
观察君的一些想法:
1、机器翻译和人工翻译比拼的不是语言能力,文字转换能力,而是语言背后的背景知识、常识、推理、文化。
2、机器翻译在可见未来的使命是完成语言层面的转换(单词、词法、句法关系……那些死记硬背的、学外语过程中痛苦大于乐趣的东西),但不是文化和思维(那恰恰是学外语的乐趣所在);
3、人工译者早就该从语言转换层面甚至格式处理层面解放出来,去完成更高、更虚、但也是更有乐趣的文化交流。
4、从产品角度,语言转换需求比较强的场景——内容资讯浏览、同声传译、跨语言文本分析、电商平台、跨语言搜索,机器翻译可以大胆有所为,激进一点;至于夹杂跨文化、常识和专业知识较多的场景——会议口译、专业领域笔译(人文学科尤甚)、书籍翻译等,加倍重视人机交互、甚至一些“低级”的格式问题、工具门槛问题,会更靠谱些。
5、巴别塔不可能重建。即使语言通了,文化思维什么时候通?(想想每一种语言后面的文化沉淀,再想想世界语的现状);退一万步讲,假设文化思维通了,人心隔肚皮、鸡同鸭讲的情况还不多吗?当然这已经是人性层面的问题了。
6、好消息是,接下来会是机器翻译产品化令人兴奋的时期,因为语言转换的问题,在技术层面在很大程度上刚得到解决。要铺开的场景太多了,外面的世界太大了,我们被困得太久了。
7、One more thing,还有一个好消息,看看机器翻译产品的“对手”:人工同传的高投入和低产出;大多数国人的外语水平;即使是专业学外语的人,我们也太高估他们获取外语信息的能力和耐心。
欢迎拍砖。
投一票再走: