Richard Hudson院士谈依存语法

Richard Hudson 语言科学 2022-04-24

This book deserves a prominent place in the growing international literature on dependency grammar and computational linguistics. The nature of syntactic structure is one of the most disputed questions in linguistics because science and tradition are so hard to separate in one of the most fundamental disputes.

An ancient tradition in Europe and the Middle East gives priority to the word as the basic unit of syntax, which means that syntax is primarily a matter of defining the relations between individual words—what have come to be called “dependencies”. For instance, in the sentence “Small children often cry”, the syntactician identifies just three dependencies that relate small to children, children to cry, and often to cry; once these dependencies have been identified, and the words and dependencies have been classified, nothing more remains to be said about the sentence’s structure.

A much more recent tradition started with Leonard Bloomfield and the American structural linguists in the early twentieth century, and has come to dominate syntactic theory. In this tradition, the structure of a sentence consists of a more or less elaborate hierarchy of “phrases” in which the word has no particular priority. In “phrase structure grammar”, in contrast with “dependency grammar”, the four words of our example are combined with at least three phrases(small children, often cry and small children often cry) and possibly more—for example, cry would typically be classified not only as a word but also as a one-word phrase.

Unfortunately for scientific progress, this tradition was built from scratch, with very little reference to the existing dependency theory, and continues to ignore the dependency alternative. The result is that the very foundations of the scientific study of syntax are unstable, with an unresolved conflict between phrase structure and dependency structure. The main influence on syntactic theory is not debate and research, but geography. Linguists trained in America adopt phrase structure, while the more independent syntacticians of Europe favour dependency theory. This cannot be good for our discipline.

This background explains why a European dependency grammarian like me is pleased to see dependency theory being so ably developed by Haitao Liu outside the traditional “battle-field” of Europe and America, in the People’s Republic of China. His dependency analyses of Chinese are a particularly welcome contribution to dependency theory. However, what is most exciting about his work is the way in which he has applied dependency analysis to large corpora in different languages, something which is possible nowadays thanks to the use of computers.

A corpus of naturally occurring sentences is the ultimate test of any theory of language precisely because it shows how important it is, in theorizing about language, to go beyond mere grammar. For instance, Liu reports that his Chinese corpus contains a very similar proportion of nouns to the proportion that I reported some years ago for several English corpora: about 41%. This is, indeed, an extraordinary finding; but it demands an explanation. Why should this figure emerge from such different corpora? One thing is clear: the explanation cannot lie only in grammar. To understand usage, we need a much broader range of theories: not only linguistic theories of grammar, vocabulary and genre, but also psychological theories of working memory. Liu’s studies address many of these questions, though it is surely too soon to expect satisfying answers to many of them.

Perhaps the most interesting topic discussed in this book is the statistical measure of syntactic difficulty called “dependency distance”. This measures the load which a word places on working memory, on the reasonable assumption that a word is kept active in working memory until all its outstanding dependencies have been satisfied. Returning to our earlier example, “Small children often cry”,most of the words are very easy to process because their dependencies are satisfied by the next word; for instance, small needs a “parent” word, but this is immediately provided by children; and the same is true of often, which depends on the next word cry. But children is slightly harder because it is the subject of cry, from which it is separated by often. This increased load is still trivially easy for adult English speakers, but as the dependency distance between children and cry increases, the difficulty increases, and most English speakers struggle with really long subjects such as “Small children with anxious parents who keep trying to get them to smile and be happy even when they have tummy ache or when they are teething often cry”.

Earlier work on dependency distance in languages such as English suggest that the limitations of working memory keep the average dependency distance quite low, and one would expect the same to be true in other languages. But Liu has found evidence for considerable variation among languages. In particular, he reports that the average dependency distance in Chinese is at least twice as great as that in English. This is an extraordinarily important finding which should stimulate a great deal of productive research. Do other corpora in English and Chinese show the same differences? If they do, why are the effects of working memory so different in the two languages? Is it because Chinese words are easier to hold in memory, so that more words can be kept active? Or is it because Chinese speakers have less limited working memories? I, for one, look forward very much to the light that Liu’s future work will certainly cast on these fascinating questions.

本文为刘海涛教授著《依存语法的理论与实践》一书序言，作者Richard Hudson。

《依存语法的理论与实践》

刘海涛著

北京：科学出版社，2020.3

ISBN 978-7-03-024866-4

作者简介

刘海涛，国际世界语学院院士，教育部长江学者特聘教授；浙江大学求是特聘教授、博士生导师；北京语言大学特聘教授，广东外语外贸大学云山领军学者。Journal of Quantitative Linguistics 等多种国内外语言学出版物的主编、副主编与编委会成员。浙江省优博论文指导教师。国务院政府特殊津贴获得者。研究成果曾多次获得教育部与省级社科奖。爱思唯尔2014-2018年“中国高被引学者”。

本书目录

重印说明
冯志伟序 i
Foreword vii
理查德·哈德森序 xi
前言 xv
第1章依存结构树 1
1.1 引言 1
1.2 泰尼埃之前的句法树 4
1.3 泰尼埃的图式 6
1.4 泰尼埃之后的依存树 10
1.5 依存树的一般特性和结构 18
第2章配价理论与配价词表 23
2.1 引言 23
2.2 泰尼埃与早前的配价研究 25
2.3 现代配价及依存理究研究概览 34
2.4 配价词爽（表）的格式和框架 55
2.5 配价词表结构框架 67
第3章依存关系与汉语依存语法 76
3.1 引言 76
3.2 摩迪斯泰学派和泰尼埃的早期思想 77
3.3 其他学者关于依存关系的讨论 84
3.4 依存关系的属性和依存句法的构建 97
3.5 汉语依存语法 102
3.5.1 现代汉语词类体系 102
3.5.2 现代汉语依存关系 104
3.6 概率配价模式和汉语配价模式 106
3.7 汉语依存树库 111
3.8 小结 115
第4章依存语法形式化研究 117
4.1 引言 117
4.2 语言的形式化 118
4.3 泰尼埃的依存语法形式化体系 120
4.4 美国的依存语法形式化模型 124
4.5 基于特征结构的依存语法形式化体系 129
4.6 基于树结构的依存语法形式化理论 131
4.7 基于约束的依存语法形式化研究 134
4.8 德国的依存语法形式化研究 136
4.9 基手配价模式的依存语法形式化模型 141
4.10 依存语法和短语结构语法的等价性 146
4.11 小结 153
第5章依存句法分析 155
5.1 引言 155
5.2 句法分析的概念及定义 156
5.3 基于泰尼埃理论的依存句法分析 161
5.4 基于上下文无关文法的依存句法分析 163
5.5 基于扩展上下文无关文法的依存分析 165
5.6 基于约束的依存句法分析 167
5.7 规则与统计相结合的依存句法分析 171
5.8 基于槽概念的依存句法分析 174
5.9 基于语言学理论的依存句法分析 178
5.10 基于统计的依存句法分析 185
5.11 基于配价模式的依存句法分析 192
5.12 小结 194
第6章基于规则的汉语依存句法分析 199
6.1 基于配价模式的汉语句法分析 199
6.2 基于简单合一运算的汉语分析 205
6.3 用链语法分析汉语 210
6.4 采用移进—归约算法分析汉语 214
6.5 基于复杂特征的汉语依存分析 217
6.6 小结 224
第7章基于树库的汉语依存句法分析 226
7.1 真实文本汉语依存句法分析 226
7.2 归纳依存句法分析及应用 230
7.3 用自建树库进行的句法分析实验 232
7.4 修改树库标注方式后的句法分析 237
7.5 使用哈尔工业大学依存树库的依存分析实验 241
7.6 影响依存句法分析的因素探讨 244
第8章基于依存树库的汉语计量研究 250
8.1 词类与依存关系的统计 251
8.2 依存距离的统计与分析 252
8.3 依存关系构成的统计与分析 259
8.3.1 按照依存关系对支配词和从属词的统计分析 259
8.3.2 按照支配词和从属词对依存关系的统计分析 261
8.4 从句法树到语言网 266
参考文献 276
结语 313
后记 316