新文科背景下语料库应用研究高端论坛（2号通知）

童俊应用语言学研习 2021-03-16

转载自：语料库应用研究中心微信公众号

上海海洋大学语料库应用研究中心将于2020年10月17日召开“新文科背景下语料库应用研究高端论坛”。作为语言大数据的载体，语料库凭借海量的客观、真实语言证据，推动着人工智能、自然语言处理、计算机辅助翻译等语言相关领域的不断发展。本次论坛着重关注语料库在相关研究领域中的应用，旨在拓展语料库语言学与外国语言文学一级学科内部学科方向以及其他人文社科、自然科学领域的深度交流与合作，从而进一步推动新文科背景下的跨学科研究。

一、会议时间

2020年10月17日

上午：08:20-12:00

下午：13:30-18:00

二、组织方式

线下（限校内人员）+线上直播。

线上腾讯会议ID:

上午: 843 524 144

下午: 591 158 525

哔哩哔哩直播间: 150 295 99(主账号)

链接: https://live.bilibili.com/15029599

哔哩哔哩直播间: 224 829 55(备用账号)

链接:http://live.bilibili.com/22482955

三、联络方式

联系人：童俊 13163738373
联系人：杨粤 18800278185

联络邮箱：corpus1017@126.com

四、会议日程

五、专家主旨发言摘要（按发言先后顺序）

Epistemology, Disciplinary Culture and Language Use: a case study of evaluative it patterns across disciplines

卫乃兴

北京航空航天大学

摘要：This paper investigates disciplinary variation in academic discourse by focusing on constraints of epistemology and disciplinary culture on language use in relation to knowledge construction. Drawing on Becher’s (1987,1989) classificatory framework for disciplinary grouping and by utilizing data from the Beijing CARE (Beijing Collection of Academic Research Essays) corpus, we focus on the frequent evaluative it patterns (Hunston & Francis 2000) in research articles of four disciplines, namely, physics, computer science, history and education, as representatives of so-called hard-pure, hard-applied, soft-pure and soft-applied disciplines respectively. We use the technique of correspondence analysis to treat the data, before probing into relationships between the distribution of it patterns and the corresponding epistemological factors of the four disciplines under study. The findings indicate that differences in uses of the evaluative it patterns largely correspond to the broad distinctions, i.e., hard vs. soft and pure vs. applied. It is argued that epistemological precepts shape the use of language to a large extent. Nevertheless, due to the impact of unique disciplinary culture and interdisciplinary trend, research communities are prone to certain discipline-specific conventions for linguistic choices for their knowledge construction. Therefore the relationship between epistemology and langue use is not definitive, nor generative in nature. Particularly, for so-called pure vs. applied disciplines, the constraints are not consistent.

大数据时代的语料库文本与数据

李文中

浙江工商大学

摘要：随着网络技术的发展和信息的聚积，语料库概念的边界在不断被拓展，其形态也在产生很大的变化，原来的一些重要设计原则，如抽样和代表性、样本均衡性、文本的涵义等，都需要重新思考。随着语料库的进一步发展，以上概念可能会变为动态的、相对性的操作考量：用什么样的语料库、做什么、是否有效？而这些决策将不再是语料库开发者的主要责任，而是由使用者根据自己的研究进行设定。从某种意义上讲，传统语料库所收集的文本并不是严格意义上的原态文本：文本的版式、字体、呈现形态、插图、附文，以及其他副文本信息等，在转换成纯文本格式时都会丢失，更不用说现在日趋多样化的网络超文本，如音视频、动画、可视化图形、图片、互动、链接等。从更广泛的意义上讲，人的世界都是文本；问题在于我们的测量能力。未来的语料库可能需要对所有的文本元素进行范畴化，并进行有效标记和编码。虽然我们难以预测语料库到底能发展到哪一步，但至少需要对语料库的发展持一种开放的心态。

没有人能站在一个适当的位置，去规定语料库该怎么用、不该怎么用；就是有人这么做，也不会有人听。语料库从一出现就是有多个源头、多种用途的，如自然语言处理和机器翻译用语料库同样出现很早。但在语料库研究评价上，始终存在对待语料库数据的态度问题：是从语料库中寻找证据（evidence）？还是从数据出发获得发现？对这个问题的不同回答，决定了语言事实和数据在研究中的角色和地位。当然我们可以说不存在纯粹的客观事实，或者不存在没有理论的数据；在大多数情况下，我们只能看到我们愿意看到或能够看到的东西。

计算机对语料库的作用不仅是提高效率的问题。巨量的语料与计算机强大的检索能力，彻底改变了人们对语言的观察：一是视角的改变，计算机检索大大突破了个人基于知识经验的心理搜索以及人工文献检索，也突破了个人知识搜索的心理定势及障碍，让我们看到直觉无法企及的东西；二是视野的改变，重复与变异同时呈现，促使我们重新审视规律与变化的关系；三是工具对思想的促生作用，大量超乎预期和直觉的语言事实凸显，需要得到新的描述和解释。从这个意义上讲，人们先是改变了工具，最后受工具改变。语料库的工具性使我们无法预测到底能用它得到什么。语料量越大，变异性越强；语料时间越近，语言使用的多样性变化越大。

数据是语料库研究探索的起点和入口，真正的文本意义分析必须从数据回到文本，文本是第一阶数据（the first-order data）。唯理论或唯技术都会让我们误入歧途。数据不会说话，说话的是人。不对数据做邪恶的事，不用数据做邪恶的事。

大数据之大，越来越超出人自然感知的把控能力。数据可视化的重要性日益显著，我们赖之观察和把握高速增加的数据，测量日益复杂的数据关系，探索有价值的话题和研究方向。但是，图形是受作图者操控的，美丽的图形中存在花招和陷阱。我们只有真正掌握它，才不会在其中沉沦。

在大数据时代，语料库本身就是一种语言大数据。作为语言研究者，需要自觉培养一种数据意识，即充分掌握其来源、范畴、结构、处理机制，以及其呈现的结果与结论的关系，以文本的视角看语言，以数据的眼光看文本。只有通晓这一切，才不会被眼花缭乱的数据分析及可视化图形所迷惑，也才有能力去思考数据应用的前景与难题，去进行数据批判。人文永远是第一性的，数据只是人的行为痕迹。

广外-兰卡汉语学习者语料库的研发及其应用

徐海

广东外语外贸大学

摘要：广外-兰卡汉语学习者语料库（Guangwai-Lancaster Chinese Learner Corpus）是目前所知第一个公开的、语料相对平衡的汉语学习者语料库。该语料库规模接近130万词次，语料在语体、二语任务、母语背景、二语水平、性别等方面具有平衡性和代表性特点。该语料库具有广泛的应用前景，可较好显示汉语第二语言学习者在字词、短语、句法、语用等层面出现的语误，揭示其汉语使用能力的发展规律，为汉语第二语言习得理论和实践相关问题提供实证支持，从而有助于提高汉语二语教学的学习效率。

当学术英语遇到自然语言处理——情感视角

雷蕾

华中科技大学

摘要：自然语言处理技术给学术英语研究带来了哪些变化？语言研究者面临哪些机遇与挑战？本发言以两项学术英语情感分析实证研究为例，讨论语言研究者如何应对大数据时代的机遇与挑战。

从经典CIA到基于EUM模型的多重比较再到语料库驱动的二语实验范式——兼谈学习者语料库语言学的研究使命与疆土开拓

陆军

扬州大学

摘要：自Granger(1998)提出中介语对比分析法（CIA）以来，基于语料库的学习者语言研究广泛开展，开发了“过少使用”、“过多使用”和“误用”等分析技术，取得学习者语言与目标语在形式、意义和功能上有较大偏离、而与母语趋于一致等重要发现，从而形成了目标语知识缺乏、母语影响显著等推测或假设。不过，基于经典CIA模型的研究似乎未能直接验证相关假设、也未能把二语形式、意义和功能的偏离整合起来讨论。为此，我们开发了基于扩展意义单位（EUM）模型的多重比较：同时包括了目标语、母语与学习者语言的多重比较，并且把各比较都置于词语搭配、类联接、语义趋向和语义韵四个共选层面开展，一定程度上实现了学习者语言的形式、意义和功能特征的深度描述以及母语影响等假设的验证，拓宽了学习者语料库应用和研究的范围。

诚然，语料库研究主要关注语言使用结果，强于描述、但弱于解释。相比之下，实验研究关注语言使用过程，具有很强的解释力。在实践中，大量实验研究开始应用语料库成果、通过语料库获取实验材料，但反之不然。不过，相关二语实验研究倾向于以语言认知处理为主要关注点，但忽略了语料库研究在代表性等方面的优势。语料库驱动的二语实验范式把语料库研究作为独立构架，在语料库研究的基础上形成二语假设、设计真实二语学习任务、开发能够过滤母语影响的测量方法等。主要运用“双语真、假对应序列”、“近义词的区别性语义趋向特征”、以及“双语语义韵的非对称性”等语料库研究思想和成果开发核心实验技术，解决二语知识测量中的目标共选知识抽提、母语影响自然过滤和相关显隐性知识分离等瓶颈问题。新的研究范式从理想实验条件向真实语言使用研究迈出一大步，能为克服二语学习研究中的“实验vs 实践”二元主义问题提供方法，同时也为推动学习者语料库研究朝纵深发展开辟了蹊径、开拓了空间。

记者招待会汉英口译中名物化应用研究

胡开宝

上海外国语大学

摘要：本文探讨了名物化在汉英记者招待会口译中的运用。研究发现，汉英记者招待会口译文本中，名物化应用频率显著高于记者招待会英语原创本文和笔译文本。本文认为，这一差异可以解释为英汉两种语言的差异，更重要的是，口译员有意强调他们所翻译的话语的权威性。研究还发现，口译员更倾向于采用名词化的方式来塑造自己的职业形象和中立形象。

短语的词典学意义及基于语料库的短语提取研究

李德俊

国防科技大学

摘要：语料库语言学视野下的短语指的是具有统计意义和独立语义的词语共现。因为它同时具有使用频率高、无歧义和模块化3个特征，因而是表义的基本单位。短语在言语交际中发挥着最为重要的作用，它体现了词汇和结构的共选。短语的收录和处理与词典的交际效率成正相关。短语的识别主要是基于统计值的自动识别，目前的多种统计识别算法虽有一些缺陷，但在语料库规模合适的情况下，都能对短语进行有效识别，其中MI值和Z值识别效度较好，综合使用能取得最佳效果。

基于语料库的武侠小说英译研究——以英译金庸小说词汇范化现象为例

李德超

香港理工大学

摘要：近年来，有关中国文学如何“走出去”研究成为译界热点，当中包括了对中国特殊小说类型——武侠小说——译介的讨论。但目前对武侠小说译介的研究大多基于传统的文本或文学分析定性方法，主观色彩较强。本研究自建“金庸小说汉英双语平行语料库”，结合定性及定量方法，从词汇丰富度、标准化词类分布、高频词重合度及武侠名词归化率等几方面考察了金庸小说三大英文全译本的词范化。研究发现，闵福德译的《鹿鼎记》词范化程度最高，海外接受度亦最高；莫锦屏译的《雪山飞狐》词范化程度适中，海外接受度却最低；晏格文译的《书剑恩仇录》词范化程度最低，海外接受则呈两极化。本研究认为，译者的翻译动机和策略决定了这三大译本的词范化程度，这种词范化程度,连同其它因素（如译本内容、译者素养及出版社推介等）在内，是影响译本在英文读者市场接受的重要因素之一。剖析这种翻译动机与策略、译本词范化、译本接受之间的关系能为现阶段中国武侠文学乃至于整个中国文学的“走出去”带来启示。

Advances in corpus- and data-based discourse analysis of T&I：Topics and methods

Binhua Wang

University of Leeds

摘要：The social turn in translation and interpreting (T&I) studies, like the cultural turn before it, has successfully expanded horizon of the area from micro-analysis into words, sentences and texts to macro-analysis into the role of translators/interpreters and the function of T&I activities in society and culture. Yet it remains to be further explored how linguistic analysis can be linked to socio-cultural interpretation and how socio-cultural studies can be better validated with textual and discoursal analysis. Based on the papers published in the themed volume of Advances in Discourse Analysis of Translation and Interpreting. Linking Linguistic Approaches with Socio-cultural Interpretation (Binhua Wang & Jeremy Munday eds., Routledge in press), this presentation will give an overview of advances in corpus-and data-based discourse analysis of T&I. Typical research topics related to Chinese socio-cultural context will be summarised. Research methods and corpus tools will be reviewed and discussed.

Corpus Translation Studies: Researching the Textual Fit and Variation of Legal Translations

Łucja Biel

University of Warsaw

摘要：The objective of this talk is to discuss how corpus linguistics is applied to study specialized translation (Corpus Translation Studies) by exemplifying it with the study of translator-mediated varieties of legal languages called Eurolects.

My focus is on textual fit in relation to Eurolects — ‘Europeanized’ hybrid varieties of national languages which serve the needs of the European Union(EU) as a supranational organisation, have 24 ‘mirror’ parallel realisations and are mediated through translators (Biel 2020). They differ from corresponding domestic legal and administrative varieties, departing from target conventions at many levels (cf.Biel (2014), Mori (2018). As part of the Eurolect project, we use the concept of textual fit to measure various types of variation to better understand such departures and their causes: (1) external variation to the national variety; (2) internal variation across four administrative genres(legislation, judgments, administrative reports, websites for citizens); (3)microdiachronic variation; and (4) micro-level variation of terminological equivalents. We work with keywords, key genre markers and lexical bundles toprofile Eurolect genres. We argue that translations develop their own levelled-out formulaic profiles which, despitesome diachronic convergence, minimally overlap with formulaic profiles of domestic genres.

Corpus studies in English for Academic Purposes: what is getting published, and what do we need more of?

Hilary Nesi

Coventry University

摘要：A recent article in Nature (Mallapty2020) remarks on the huge increase in Chinese authorship of articles in Science Citation Index (SCI) journals - from about 120,000 in 2009 to 450,000 in 2019.Of course, most of these articles are from science fields, but the number of articles from China discussing English for Academic Purposes (EAP) has risen greatly too, and China is now the third most productive country for EAP publications, after the USA and the UK. A substantial number of the studies reportedin these publications use corpus methodologies: about 29% of all the articles publishedin Journal of English for Academic Purposes (JEAP), for example (Riazi2020). The general rise in international journal publications helps to explain the growth in the number of EAP publications using corpus methods - EAP researchers have recognised a need to investigate the discourse features of research in various disciplines in order to support novice writers struggling to publish in English. But corpus studies in EAP don’t just look at research writing: they also analyse textbooks, conference papers, lectures, online platforms and all kinds of academic interactions, from many different perspectives. This paper will provide an overview of the type of corpus studies JEAP is publishing right now, what the editors and reviewers are looking for, and where there might still be research gaps that corpus studies can fill.

欢迎广大师生积极参与！

上海海洋大学语料库应用研究中心

2020年10月9日

相关文献延伸阅读（专为爱书的你）

敬告：本公号仅友情分享书讯或索引书目链接，

以便爱书人前往第三方平台自行选购。

语言学图书精选推荐

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

洗牌电商圈！阿哲放话全网：挑战抖音所有机制！爆全品类大牌！

阿哲现身评论区，@一修！肉肉痛哭，无限期停播！回应舆论黑料，关闭私信评论区！

登热榜！某牙电母被S，榜一求爱遭拒！柚柚阿哲合体年度走红毯！

小敏感喊话阿哲，出镜抖音！欠钱不还，小白龙再被扒借贷官司！