查看原文
其他

双语阅读|机器智能当真能识别人类语言?

2017-05-24 编译/fairy0zoe 翻吧

IN “BLACK MIRROR”, a British science-fiction satire series set in a dystopian near future, a young woman loses her boyfriend in a car accident. A friend offers to help her deal with her grief. The dead man was a keen social-media user, and his archived accounts can be used to recreate his personality. Before long she is messaging with a facsimile, then speaking to one. As the system learns to mimic him ever better, he becomes increasingly real.

英国科幻讽刺系列剧《黑镜》所塑造的黑暗乌托邦式的未来世界中有这样一个故事:女主角的男友因车祸去世。好友为帮她走出伤痛,提议复制一个机器人“男友”。女主角的男友是一位重度社交媒体依赖者,而男友身前留下的各种网络账户能用来重塑他的性格。很快,女主便开始与这个机器人“男友”通信,聊天。这个机器人“男友”模仿得越来越好,越来越逼真。


This is not quite as bizarre as it sounds. Computers today can already produce an eerie echo of human language if fed with the appropriate material. What they cannot yet do is have true conversations. Truly robust interaction between man and machine would require a broad understanding of the world. In the absence of that, computers are not able to talk about a wide range of topics, follow long conversations or handle surprises.

这听起来很荒诞,也不是完全不可能。如果提供了合适的材料,计算机如今已能像人一样说话。只可惜它们还不能像人一样进行真正的对话。真正的对话是充满活力的,这需要对世界有一个全面的认识。做不到这一点,机器在与人交谈时常常撑不了几个回合,且谈话内容有限、一板一眼。


Machines trained to do a narrow range of tasks, though, can perform surprisingly well. The most obvious examples are the digital assistants created by the technology giants. Users can ask them questions in a variety of natural ways: “What’s the temperature in London?” “How’s the weather outside?” “Is it going to be cold today?” The assistants know a few things about users, such as where they live and who their family are, so they can be personal, too: “How’s my commute looking?” “Text my wife I’ll be home in 15 minutes.”

不过,对于一小部分特定任务,“训练有素”的机器人可谓表现出色。科技巨头们在电子产品上开发的虚拟个人助手就是最直观的例子。你可以随意问“她们”问题,比如:“伦敦气温多少度?”“外面天气怎么样?”“今天会降温吗?”“她们”还知道你家的地址、家庭成员等基本信息,所以也能处理你的私人请求,比如:“为我规划上下班线路。”“发短信告诉我太太,我15分钟之后到家。”



And they get better with time. Apple’s Siri receives 2bn requests per week, which (after being anonymised) are used for further teaching. For example, Apple says Siri knows every possible way that users ask about a sports score. She also has a delightful answer for children who ask about Father Christmas. Microsoft learned from some of its previous natural-language platforms that about 10% of human interactions were “chitchat”, from “tell me a joke” to “who’s your daddy?”, and used such chat to teach its digital assistant, Cortana.

随着时间推移,“她们”的服务会越来越周到。苹果智能助手Siri每周接收约20亿条用户指令。苹果公司通过分析这些匿名指令来进一步优化Siri的服务。苹果公司称,Siri可以识别询问球赛比分的各种问题。小朋友问起圣诞老人时,Siri也答得不错。微软在开发自然语言处理系统的过程中发现,人们聊天时十句里总有一句闲话,比如“讲个笑话吧”或者“叫爸爸!”。微软个人助手Cortana就这么学会了闲聊。


The writing team for Cortana includes two playwrights, a poet, a screenwriter and a novelist. Google hired writers from Pixar, an animated-film studio, and The Onion, a satirical newspaper, to make its new Google Assistant funnier. No wonder people often thank their digital helpers for a job well done. The assistants’ replies range from “My pleasure, as always” to “You don’t need to thank me.”

Cortana的幕后“导师”包括两位剧作家,一位诗人,一位编剧和一位小说家。谷歌也找来美国皮克斯动画公司和幽默讽刺报纸《洋葱》的写手们来为新版谷歌助手添加趣味。虚拟个人助手如此逼真,工作完成得这么出色,怪不得我们要对“她们”说一句“谢谢”了。


Good at grammar

擅长语法


How do natural-language platforms know what people want? They not only recognise the words a person uses, but break down speech for both grammar and meaning. Grammar parsing is relatively advanced; it is the domain of the well-established field of “natural-language processing”. But meaning comes under the heading of “natural-language understanding”, which is far harder.

“她们”回答的方式也多种多样,除了常常说“不客气,这是我应该做的”,“她们”还会说“荣幸之至”,“能帮到您我很开心”,或者“您的满意是我永恒的追求”。可见,自然语言处理系统擅长语法,那么“她们”到底是怎么知道我们的想法的?她们不仅能识别我们说话时的用词,还可以分析其语法和意义。自然语言处理系统如今已经非常成熟了,语法分析是其必不可少的一项,技术也还算先进。然而,人类话语的真正含义来源于对自然语言的理解,要做到正确理解十分困难。


First, parsing. Most people are not very good at analysing the syntax of sentences, but computers have become quite adept at it, even though most sentences are ambiguous in ways humans are rarely aware of. Take a sign on a public fountain that says, “This is not drinking water.” Humans understand it to mean that the water (“this”) is not a certain kind of water (“drinking water”). But a computer might just as easily parse it to say that “this” (the fountain) is not at present doing something (“drinking water”).

先说语法分析吧。大部分人都不擅长句法分析,计算机却精于此道。人们很多时候都意识不到自己所说的话具有歧义,计算机却能识别和分辨。比如一个喷泉池前的英文标识:这不是饮用水(This is not drinking water)。懂英语的人能读懂这句话是说“这喷泉池里的水(This)不是(is not)可以饮用的水(drinking water)”。计算机却很容易理解成“这个喷泉池(This)不是(is not)正在喝(drinking)水(water)”。


As sentences get longer, the number of grammatically possible but nonsensical options multiplies exponentially. How can a machine parser know which is the right one? It helps for it to know that some combinations of words are more common than others: the phrase “drinking water” is widely used, so parsers trained on large volumes of English will rate those two words as likely to be joined in a noun phrase. And some structures are more common than others: “noun verb noun noun” may be much more common than “noun noun verb noun”. A machine parser can compute the overall probability of all combinations and pick the likeliest.

句子越长,就有越多语法上说得通却并不表意的理解选项。那么,语法分析系统怎么从这些选项中挑出说话人真正想要表达的意思呢?它首先要了解词组的使用频率。英文中,“drinking water”作为名词词组“饮用水”比动词词组“喝水”更常用。只要语法分析系统足够熟悉英语词组的使用,就能得出此处“drinking water”更可能是名词短语“饮用水”。它还要了解哪些语法结构更加常见。“名词+动词+名词+名词”可能要比“名词+名词+动词+名词”的结构常见得多。语法分析系统可以算出一句话里所有词语可能组合出的所有意思,再从中挑出最有可能的一个。


A “lexicalised” parser might do even better. Take the Groucho Marx joke, “One morning I shot an elephant in my pyjamas. How he got in my pyjamas, I’ll never know.” The first sentence is ambiguous (which makes the joke)—grammatically both “I” and “an elephant” can attach to the prepositional phrase “in my pyjamas”. But a lexicalised parser would recognise that “I [verb phrase] in my pyjamas” is far more common than “elephant in my pyjamas”, and so assign that parse a higher probability.

如果语法分析系统不仅懂句法还懂词法,那它理解的准确性还会更高。《格劳乔· 马克斯》里有一句经典台词:"我曾在我的睡衣里射杀了一头大象。”“至于它怎么进来的,我就不知道了。"(One morning I shot an elephant in my pyjamas)。这两句话之所以引人发笑,是因为第一句话中“穿着睡衣”(in my pyjamas)的主语不明确,既可以修饰“我”(I),也可以修饰“大象”(elephant)。可是,一个懂词法的语法分析系统会意识到“我穿着睡衣”这一词组的使用频率远高于“大象穿着我的睡衣”,从而选择可能性更高的语义组合,即“我穿着睡衣开枪打死了大象”。


But meaning is harder to pin down than syntax. “The boy kicked the ball” and “The ball was kicked by the boy” have the same meaning but a different structure. “Time flies like an arrow” can mean either that time flies in the way that an arrow flies, or that insects called “time flies” are fond of an arrow.

不过,语义分析比句法分析更难。“男孩儿踢了球一脚”和“球被男孩儿踢了一脚”意思相同,结构却不同。“光阴如箭般飞逝”(Time flies like an arrow)换一种理解方式就成了“时间飞虫喜欢箭”(Time flies, like, an arrow)。要是有人问你:“《雷神索尔》中谁扮演索尔?”你肯定想不起来扮演漫威超级英雄系列科幻电影里威武的古诺尔斯神的那个澳大利亚演员是谁。


“Who plays Thor in ‘Thor’?” Your correspondent could not remember the beefy Australian who played the eponymous Norse god in the Marvel superhero film. But when he asked his iPhone, Siri came up with an unexpected reply: “I don’t see any movies matching ‘Thor’ playing in Thor, IA, US, today.” Thor, Iowa, with a population of 184, was thousands of miles away, and “Thor”, the film, has been out of cinemas for years. Siri parsed the question perfectly properly, but the reply was absurd, violating the rules of what linguists call pragmatics: the shared knowledge and understanding that people use to make sense of the often messy human language they hear. “Can you reach the salt?” is not a request for information but for salt. Natural-language systems have to be manually programmed to handle such requests as humans expect them, and not literally.

可是,当你用苹果手机问Siri时,会得到一个意想不到的答案:“《雷神索尔》今日不在美国爱荷华州索尔县上映。”美国爱荷华州索尔县,人口仅184名,几万公里远,而《雷神》这部电影也已下映多年了。Siri 对这个问题进行了完全合理的语法分析,给出的回答却很奇怪,不符合语言学家称之为语用学的规则。即对于语言,人们拥有一定的共享知识与理解,这些基本的知识和理解可以帮助人们消除日常对话中的歧义。比如:“Can you reach the salt?”(你能够到盐罐吗?)这不是一个问题,而是一个请求。自然语言处理系统中必须写入特定程序,使它能按照预期处理我们的请求,解决我们的真正需求,而不是仅仅停留在问题的表面。


Multiple choice

多重选择


Shared information is also built up over the course of a conversation, which is why digital assistants can struggle with twists and turns in conversations. Tell an assistant, “I’d like to go to an Italian restaurant with my wife,” and it might suggest a restaurant. But then ask, “is it close to her office?”, and the assistant must grasp the meanings of “it” (the restaurant) and “her” (the wife), which it will find surprisingly tricky. Nuance, the language-technology firm, which provides natural-language platforms to many other companies, is working on a “concierge” that can handle this type of challenge, but it is still a prototype.

Such a concierge must also offer only restaurants that are open. Linking requests to common sense (knowing that no one wants to be sent to a closed restaurant), as well as a knowledge of the real world (knowing which restaurants are closed), is one of the most difficult challenges for language technologies.

谈话过程也是信息逐步共享的过程,你的虚拟个人助手就是靠这个理解你话语中的递进和转折的。如果你跟这个“助手”说:“我想和爱人去意大利餐馆吃饭,”它会给你推荐一家餐厅。如果你接着又问:“离她的工作单位近吗?”(Is it close to her office?),这时“助手”必须理解“it”(餐厅)和“her”(妻子)的含义才行,恐怕这还挺为难它的。全球最大语音识别科技公司Nuance为多家企业提供自然语言处理系统,正致力于研发出一个可以应对此类挑战的处理器,目前只做出了雏形。针对以上问题,这个处理器还必须挑出正在营业的餐馆。如何将人们的请求(如找家餐馆吃饭)、常识(餐馆必须开门才能进去吃饭)以及对现实情况的了解(哪些餐馆今天关门了)联系起来,是语言技术面临的最大的挑战之一。


Common sense, an old observation goes, is uncommon enough in humans. Programming it into computers is harder still. Fernando Pereira of Google points out why. Automated speech recognition and machine translation have something in common: there are huge stores of data (recordings and transcripts for speech recognition, parallel corpora for translation) that can be used to train machines. But there are no training data for common sense.

人们很早就发现,常识这个东西一点儿也不平常。把它编入计算机程序更是难上加难。谷歌公司的计算机专家费尔南多·佩雷拉(Fernando Pereira)道出了其中缘由。他认为智能语音识别和机器翻译很像。它们都储存了大量数据,人们使用这些数据来“训练”它们,智能语音识别储存录音和转录文本,机器翻译则拥有平行语料库。这些数据的累积可以帮助机器提升工作的准确度与效率,但对于建构常识,却还没有什么能够储存的数据。


编译:fairy0zoe

审校:雷琰

编辑:翻吧君

来源:经济学人



阅读·经济学人 

伦敦测试陆地快递机器人送餐服务

人工智能需要新代词

X光扫描仪器的人工智能化

能下围棋的人工智AlphaGo挑战人类棋手

终结者机器人不会很快降临





翻吧·与你一起学翻译微信号:translationtips 长按识别二维码关注翻吧

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存