查看原文
其他

人物专栏 | Giuseppe Samo博士访谈


编者按

《理论语言学五道口站》(2020年第9期,总第74期)“人物专栏”与大家分享本站采编人员马晓彤对Giuseppe Samo博士进行采访的访谈录。Giuseppe Samo博士,北京语言大学语言学系青年教师,主要从事语言学与人工智能、统计学、实验方法以及语言习得的接口研究。本期访谈中,Samo博士同我们分享了他对跨语言研究、历时句法与共时句法的关系等问题的看法,并简述了句法研究对计算语言学和人工智能发展的影响。


人物简介

Giuseppe Samo博士


Giuseppe Samo博士,北京语言大学语言学系教师,也是北京语言大学首位正式聘用的在编外籍教师。他教授的语言学课程重点关注语言学理论与人工智能、统计学、实验方法、语言习得及历时研究等方面的接口。Giuseppe Samo在瑞士日内瓦大学获得博士学位,他运用句法制图建立形式模型,进而解释V2语言的句法微变化。他的研究方向包括句法理论与语用学、计算机语言学和数据科学的接口研究。与此同时,Samo博士也努力促使语言学理论广泛传播,以为更多人所熟知。


代表作

Samo, G. (2019) A Criterial Approach to the Cartography of V2, John Benjamins Publishing, ISBN 9789027204486.

Samo, G., Merlo P. (2019) Intervention effects in object relatives in English and Italian: a study in quantitative computational syntax, Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019), Association for Computational Linguistics, 46 – 56.


【人物简介】(英文版)

Dr. Giuseppe Samo is lecturer and researcher at the Beijing Language and Culture University, where he teaches courses of linguistics focusing on the role of linguistic theory at the interfaces with Artificial Intelligence, statistical and experimental methods, language acquisition and diachronic studies. He received his doctor degree at the University of Geneva (Switzerland), working on a formal model to account for syntactic micro-variation among V2 languages adopting cartographic analytical tools. His interests include the role of syntactic theory at the interfaces with pragmatics, computational linguistics and data science. Finally, he carries dissemination activities concerning linguistic theory to a more general public.


Selected publications:

Samo, G. (2019) A Criterial Approach to the Cartography of V2, John Benjamins Publishing, ISBN 9789027204486.

Samo, G., Merlo P. (2019) Intervention effects in object relatives in English and Italian: a study in quantitative computational syntax, Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019), Association for Computational Linguistics, 46 – 56.

访谈录

 

马晓彤:您能就跨语言研究给我们提出一些建议吗?

    

Samo博士:语言学是关于语言系统及其结构的科学研究,其中跨语言研究扮演着重要的角色。比较语言学,尤其是比较句法,对我们理解大脑中语言系统的功能有着重要作用。其实,不同的“语言”都是这一心理机制的输出表现。我们需要调查和分析不同的语言和方言,以发掘这种独特的人类官能背后的形式规则,关于普遍语法的讨论早从诺姆·乔姆斯基的著作就已开始。换言之,我们需要观察所有自然(人类)语言,把它们当作一种普遍语法的多种变体进行研究。

优秀的语言学家对于自己没有掌握的语言也能够进行研究,只需获取并利用所有能得到的语言学数据,包括语料库数据、以前的文献资料、以非正式的方式询问母语者、做实验等。记住,一定要将语料结果与你的母语直觉进行比较:有句法方面的开创性论文表明,基于汉语的直觉的确有助于解决印欧语言的理论问题。

 

马晓彤:我们应当如何看待历时句法和共时句法的相互关系?

    

Samo博士:历时句法与共时句法紧密相关。两种不同的当代语言(比如意大利语和普通话)之间的比较,正是比较同一种语言的两个早期阶段(例如,古英语和现代英语)的相同过程。正如许多研究者所说,这是因为儿童在习得母语时,他完全忽略了这种语言的早期阶段(比如说意大利语的人会忽略拉丁语),也自然而然地忽略了这一语言所经历的所有的语言变化。在这两个维度上,语言学家只是分析语法,试图将其作为一套规则和策略。也就是说,历时和共时句法的研究人员其实是用相同的工具对语言进行描述,其中主要的不同之处在于他们所分析的语言素材。那些已经灭绝的语言,自然是没有人再以它们为母语了。而不幸的是,历史上的每时每刻,我们都无法完全接触到世界每一个角落所说的每一种语言。这时,共时这一维度的研究就可以帮助历时研究,为我们提供分析和理论见解的来源。

 

马晓彤:您认为学生在学习语言学时应当具备哪些品质?

 

Samo博士:和其他社会科学一样,我们须知语言学也是在研究“世间万物”。如同使用显微镜可以观察生物体的组织和细胞,使用望远镜可以观察外天空的群星,语言事实(包括言语行为和语言能力)同样可以进行微观观察。因此,进行语言研究,勤奋和想象力不可或缺。

既然谈到了语言学的元素,那不得不说对“语言”本身的热爱也很重要:掌握的语言种类越多(即便知之甚少),激发我们的兴趣和研究问题的模式就可能越有趣,这是很自然的结果。

 

马晓彤:学生从语言学知识的学习中可以获得哪些优势呢?

 

Samo博士:首先,学生会知道语言不仅仅是沟通交流的工具,同样也是进行科学研究的载体。例如,每个话语都可以分成无数个子部分,以及这些子部分如何组合在一起的这一观点,即语言工作原理的推理思考,已经贯穿了从亚里士多德到20世纪逻辑学家的哲学历史研究。

从语用的角度来讲,了解语言如何工作也很有帮助,尤其是对那些与语言相关的职业:教授一门外语时,人们能够确认复杂程度;做传播研究时,人们能完善分类模式;在计算机科学中,特别是在信息和数据科学中,人们会发现文本并不能像数字一样精确地工作,等等。

学习语言学知识还有更重要的意义。如果我们仔细想想,就会发现每一门学科都是以文本为基础并产生文本的:对这些文本进行语言学分析,我们也许就能在这些领域提出新的研究问题。

 

马晓彤:在您看来,制图研究能给计算语言学带来什么影响?

 

Samo博士:制图研究和计算语言学是互惠互利的。为方便理解这一点,我会强调Luigi Rizzi教授所说的句法研究的“启发能力”。制图结果可能代表了一个独特的模型,它不仅能从宏观和微观层面捕捉和预测语言变异,而且可以创建一个语言元素的“门捷勒夫表”。二十年的制图研究足以说明我们已经具备了重要的经验(你可以参考由Giuliano Bocci教授、Karen Martini博士和我自己在日内瓦大学编辑的法制图网站http://unige.ch/SynCart)。上世纪90年代末期,我们的研究重点是罗曼语和日耳曼语;21世纪初,出现了许多来西非语、乌格洛芬尼语和南岛语的出版物;过去十年,我们在东亚语言方面开展了许多工作,特别是在北京语言大学提供的学术体系以及司富珍教授的支持下,我们对汉语普通话的研究。希望在未来十年,我们能更快地对世界各地使用的语言进行详尽的句法描述。

在基于规则的机器学习(如监督式学习)中,制图所提供的形式说明得以实现,这些解释提高了语法自动翻译的质量,而且完善了信息的自动检索,如检索应答策略和左缘结构。

另一方面,计算方法代表了制图语法学家的创新观点。例如,在我与Merlo教授近期合著的一篇文章中(Samo, G., Merlo, P., 2019,  Intervention effects in object relatives in English and Italian: a study in quantitative computational syntax, Quasy, 46 – 56, https://www.aclweb.org/anthology/W19-7906/),我们采用了定量计算的方法测试语言学的观点,并解决制图问题。

 

马晓彤:您认为语言学研究,尤其是句法研究会对当下非常热门的人工智能有什么贡献呢?可以举一些例子来说明。

 

Samo博士:句法研究成果无论是在过去、现在甚至是将来都会为人工智能的研究,包括统计学在内,提供很多思路,因为大部分机器所接触的语言数据都是以文字形式呈现,那么自然就难以把声音从语音到语用的所有方面都包括在内。另一方面,句法及其相关研究扮演着很重要的角色,例如,对于语义和语用的句法化,我们可以只通过句法去探索语言信息的多重维度。同时,句法研究对机器翻译也很重要,无论是在传达句子的原有含义还是句子的原有情感方面,句法都能为其提供最好的翻译策略,关于句子的极性理解也是一个很好的例子。在最近关于人工智能的文献中,语言学理论发挥着很重要的作用:在人工智能系统中,一种极为常见的做法是把文字转化成矢量,简单说,就是统计词汇与词汇的共现频率,这是非常简单又实用的想法。但是,语言学理论告诉我们不是每个词都有一样的属性:形容词,名词,主语,宾语等等,根据属性它们可以表现为数学中的函数或矩阵(可参考Baroni M. & Zamparelli R., 2010, Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 1183–1193, https://www.aclweb.org/anthology/D10-1115/)也就是说,句法与人工智能间的相互交流对彼此都大有裨益。

 

马晓彤:您对那些从未接触过计算机科学的语言学专业的学生有什么建议?

 

Samo博士:计算机科学是数字科学,而数字是任何一门科学的好伙伴。在今天,我们很容易接触到计算机科学(网页上的教程,针对任何级别的手册),因为新生代的学生出生在一个数字化时代。语言学专业的学生应该认识到,包括计算机科学和统计学在内的计算方法,都是其进行研究的工具。大体说来,我们可以将理论转换成等式,然后看语料在等式是否成立。现在看来可能有些困难,但是通过慢慢积累,我们就能扎实基础,进而能够整合理论语言学的知识。在语言学之外,如果你会“说”计算机科学和高科技的“语言”,也至关重要,因为这可能赋予你另一种工作契机。

Xiaotong Ma: Could you please give us some suggestions about the cross-linguistic study?


Dr. Samo: Being linguistics the scientific study of language as a system and its structure, crosslinguistic studies play an important role. Comparative linguistics, in particular comparative syntax, represents a useful tool to investigate our understanding of the functioning of the language system in our mind. As a matter of fact, the different “languages” represent the output of this mental device. The different languages and dialects are the data we need to filter and analyze to detect the formal rules underlying such a unique human faculty, the universal grammar discussed starting from the works by Noam Chomsky. In other words, we need to observe all the natural (human) languages as the variations of a single melody.

A good linguist should be able to work on languages s/he does not even fully master. Just take and exploit all the linguistic data you can access to: corpus data, former literature data, informally ask native speakers, make experiments. Remember to always compare the results with the intuition you have from your native variety: seminal papers in syntax has shown that an intuition based on Mandarin solved theoretical issues on Indoeuropean languages.

 

Xiaotong Ma: How should we deal with the correlation between diachronic syntax and synchronic syntax?


Dr. Samo: Diachronic syntax and synchronic syntax are strictly connected. The comparison between two different contemporary languages (let’s say, Italian and Mandarin) is exactly the same process of comparing two earlier stages of the very same language (for example, Old English and modern English). This is because, as stated by many researchers, when a child acquires her/his first language, s/he totally ignores the earlier stages of this language (let’s say Latin for Italian speakers) and, naturally, all the language changes the variety underwent. In both dimensions, a linguist just analyzes grammars, intended as set or rules and strategies. So, in other words, researchers in diachronic and in synchronic syntax do use the exactly same tools to describe the languages, but the main difference relies in the linguistic material they analyze. Naturally, there are no native speaker of extinct languages. Unluckily, we do not have full access to the production in every language spoken in every corner of the world at every moment of the history. Here, the synchronic dimension helps the diachronic dimension, providing sources of analysis and theoretical insights.

 

Xiaotong Ma: What characteristics do you think the students should possess for studying linguistics?


Dr. Samo: As in other (social) sciences, the first characteristic is to keep in mind that we are investigating “objects of the world”. Linguistic facts (utterances in both performance and competence) can be observed as tissues, molecules and atoms with a microscope or as stars or constellations in outer space with a telescope. Therefore, diligence and creativity walk together.

Since we are however talking about linguistic elements, it is important to have a passion for “languages”: it is natural that the more languages you master (even a little bit), the more interesting patterns might spark your interest and your research questions.

 

Xiaotong Ma: What are the advantages the students can gain from learning the knowledge of linguistics?


Dr. Samo: First of all, students will learn that language is not only a communication factor, but linguistic elements represent something you can make science on. For example, the idea that every linguistic utterance can be split in a myriad of subparts and how these subparts combine together: This reasoning on how language works has characterized the history of philosophy, from Aristoteles to the logicians of the 20th century.

Pragmatically speaking, to know how language works is tremendously helpful, especially with all the professions related to languages: in teaching a foreign language, one is able to detect layers of complexity; in communication studies, one is able to refine taxonomies of patterns; in computer science, especially information and data science, one is able to understand that string of texts do not work mathematically exactly as numbers; and so on.

More importantly, If one thinks about it, every academic discipline is based on and produce texts: probably, applying linguistic analyses on these texts we might be able generate new research questions in those fields.

 

Xiaotong Ma: In your opinion, what can the cartographic study bring to computational linguistics?


Dr. Samo: There is a mutual advantage. To understand this, let me stress on what Prof. Luigi Rizzi calls the “heuristic capacity” of cartographic studies. Cartographic results might represent a unique model able to capture and predict linguistic variability at macro- and micro- levels, creating a “Mendellevian table” of linguistic elements. After twenty years of studies in cartography, time is ripe to say that we have an important empirical coverage (you might be referred to the SynCart website edited by Prof. Giuliano Bocci, dr. Karen Martini & myself at the University of Geneva http://unige.ch/syncart). During these years, the first focus was on Romance and Germanic in the late 90s, in the early 2000s we had many publications from West-African, Ugrofinnic and Austronesian languages and in the 2010s much work has been undertaken on East-Asian languages, such as Mandarin Chinese, thanks to the great support given by Prof. Si Fuzhen and the academic architecture provided by BLCU. We hope in 2020s, we will soon have exhaustive syntactic descriptions for many languages spoken in every corner of the world.

The formal accounts provided by cartography should be then implemented those environments in machine learning based on rules (like supervised learning), improve automatic translators working on syntax and refine the automatic retrieval of information looking, for example, at the answering strategies and the Left Periphery.

On the other side, computational methods represent innovative point of view for cartographic syntacticians. For example, in a recent paper with Prof. Merlo (Samo, G., Merlo, P., 2019,  Intervention effects in object relatives in English and Italian: a study in quantitative computational syntax, Quasy, 46 – 56, https://www.aclweb.org/anthology/W19-7906/), we adopted quantitative computational methods to test linguistic proposal and to address cartographic questions.

 

Xiaotong Ma: What contribution do you think linguistic research, especially syntax research, can make to the very popular artificial intelligence? It may be illustrated by examples.


Dr. Samo: The results in syntactic research has provided, provides and will provide a lot of insights for those Artificial Intelligence (AI) methods which are not merely statistical. This is because the biggest part of linguistic data that machines have access to are in written form. In written texts, naturally, you do not have access to sounds and to all the layers related to it, from phonetics to pragmatics. Syntax, on the other hand, is there and syntactic research plays an important role. Taking into consideration, for example, the syntacticization of semantics and pragmatics, we are able to detect further dimensions of information just relying on the syntax.

Syntactic insights are useful for machine translation, providing the best strategy to translate keeping the same meaning, or, in sentiment analysis, providing a better understanding, for example, of a polarity of a group of sentences.

In the recent literature of AI, the linguistic theory has played important role: an extremely common practice in many AI systems is the so-called transformation of words into vectors, resulting, roughly speaking, from the statistical counts given by frequencies of words cooccuring with other words. Very simple, but very powerful idea. However, linguistic theory has shown that not every word has the same nature: adjective, nouns, subjects, objects and so on, represent different “mathematical” objects such as vectors or matrices (see for example, Baroni M. & Zamparelli R., 2010, Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 1183–1193, https://www.aclweb.org/anthology/D10-1115/). In other words, a mutual conversation would be extremely useful for the evolution of both fields.

 

Xiaotong Ma: What advice do you have for the students majoring in linguistics who never get into the field of computer science? 


Dr. Samo: Computer science is around numbers and numbers are good friends in any empirical science. Naturally, in 2020 it is much easier to approach computer science basics (tutorials on the web, handbooks for every level) and because this new generation of students is born digital. What a student of linguistics just needs to realize is that computational methods, both in terms of computer science and in statistics, are tools that he/she can exploit in order to develop in his/her analysis. Roughly speaking, you can translate theories in equations and see if the data points fit within it. It could be very hard at the moment, also due to the jargon, but little by little one should be able to master the basics and understand how to integrate the theoretical linguistics knowledge. It is also important for the world outside that you might be able to “speak” the language of computer science or high-tech, since it could be another type of job opportunity after the degree in Linguistics.


本文版权归“理论语言学五道口站”所有,转载请联系本平台


编辑:马晓彤 王竹叶 訾姝瑶 

排版:马晓彤 王竹叶

审校:陈旭 王丽媛



    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存