刊讯|SSCI 期刊《计量语言学》2022年第1-2期
2022-10-23
2022-10-23
Journal of Quantitative Linguistics
Volume 29, Issue 1-2, June 2022
Journal of Quantitative Linguistics(SSCI三区,2021 IF:0.761)2022年第1-2期共有研究性论文11篇,涉及词汇复杂度、依存距离、语篇研究、Menzerath-Altmann定律、嵌入模型等。
目录
ARTICLES
■Is Queen’s English Drifting Towards Common People’s English? —Quantifying Diachronic Changes of Queen’s Christmas Messages (1952–2018) with Reference to BNC, By Xinlei Jiang, Yue Jiang & Cathy Ka Weng Hoi
■Does Menzerath–Altmann Law Hold True for Translational Language: Evidence from Translated English Literary Texts, By Yue Jiang & Ruimin Ma
■Lexical Richness and Text Length: An Entropy-based Perspective, By Yaqian Shi & Lei Lei
■A Word Embedding Model for Analyzing Patterns and Their Distributional Semantics, By Rui Feng, Congcong Yang & Yunhua Qu
■ Dependency Distances and Their Frequencies in Indo-European Language, By Xinying Chen & Kim Gerdes
■Quantifying Perceived Political Bias of Newspapers through a Document Classification Technique, By Hyungsuc Kang & Janghoon Yang
■The Effect of Translation on Text Coherence: A Quantitative Study, By Elham Najafi, Alireza Valizadeh & Amir H. Darooneh
■Optimal Coding and the Origins of Zipfian Laws, By Ramon Ferrer-i-Cancho, Christian Bentz & Caio Seguin
■Frequency, Dispersion and Abstractness in the Lexical Sophistication Analysis of A Learner-Based Word Bank: Dimensionality Reduction and Identification, By Haomin Zhang, Yuting Han, Xing Zhang & Liuran Cui
■Predictive Modelling of Type Valency in Word Formation Grammar, By Kateryna Krykoniuk
■Linguistic Accommodation in Teenagers’ Social Media Writing: Convergence Patterns in Mixed-gender Conversations, By Lisa Hilte, Reinhild Vandekerckhove & Walter Daelemans
摘要
Abstract
Queen's English (QE), a linguistic symbol of the royal or upper class, is a particular variety or an aristocratic form of English. However, QE has been dethroned by a surprising finding that it shifted phonologically towards common people's English (CE) between the 1950s-1980s, arousing a debate on its existence. Based upon Queen's Christmas Messages (1952-2018) and BNC, this study quantitatively investigated whether QE has experienced diachronic changes and drifted towards CE. Our PCA analysis shows QE's fluctuating lexical richness, increasing lexical complexity and synthetism, and steady syntactic features during the six decades. Piecewise regression and statistical results indicate 1) QE is drifting towards CE in lexical richness and complexity between the 1950s-1980s; 2) QE exhibits an interaction between a “drifting force” and a “deviating force” towards or from CE between the 1950s-1980s in syntactic features; 3) QE maintains a synthetic form distinct from the analytical one of CE over the 66 years. These phenomena are likely related to the collapsing social structure between the 1950s-1980s, identity building in Queen's early reign and age factor. This study firstly quantify the drift of QE towards CE lexically and syntactically, which may shed some light on quantitative investigation of diachronic language changes.
Does Menzerath–Altmann Law Hold True for Translational Language: Evidence from Translated English Literary Texts
Yue Jiang & Ruimin Ma , School of Foreign Studies, Xi’an Jiaotong University, Xi’an, China
Abstract
Menzerath–Altmann Law (MAL) is regarded as one of the fundamental laws of language due to its extensive validity for different languages at various linguistic levels and applicability for register differentiation. However, whether MAL holds true for translational language remains to be answered. Translational language, different from both the source language and target original (non-translated) language, is viewed as ‘the third code’. This study delves into the validity of MAL for translated English literary texts and its comparable original texts by exploring the relationship between the sentence length (in number of clauses) and the clause length (in number of words). Results of the study corroborate that MAL held true for both original and translated texts. In addition, both a and b, the fitting parameters of MAL formula, could differentiate the translational language from the original, thus justifying the uniqueness of translational language as ‘the third code’ in its own right. This finding suggests that the fitting parameters might be viable indicators for typological differentiation in translation studies. Further, exploring the dynamic relations between a language construct and its constituents may shed some light on the translating process.
Lexical Richness and Text Length: An Entropy-based Perspective
Yaqian Shi & Lei Lei, School of Foreign Languages, Huazhong University of Science and Technology, Wuhan, People’s Republic of ChinaAbstract
Text length is a major concern in the measurement of lexical richness, and how lexical richness is affected by text length still remains open. The present study aims to explore the relation between text length and lexical richness from an entropy-based perspective. Results show a non-linear growth pattern of lexical richness by increasing text length. To be specific, lexical richness increases rapidly with shorter texts. It soon reaches a boundary point from which it stabilizes despite the continuous expansion of text length. The boundary point of the lexical richness by the Shannon estimation is around 1000 tokens and that by the Zhang estimation is lower and more varied, including 500, 800, and 1000 tokens. Such stability may be explained by the stabilization of word probability in the text.
A Word Embedding Model for Analyzing Patterns and Their Distributional Semantics
Rui Feng, School of International Studies, Zhejiang University, Hangzhou, ChinaCongcong Yang, School of International Studies, Zhejiang University, Hangzhou, ChinaYunhua Qu, School of International Studies, Zhejiang University, Hangzhou, ChinaAbstract
Recent advances in natural language processing have catalysed active research in designing algorithms to generate contextual vector representations of words, or word embedding, in the machine learning and computational linguistics community. Existing works pay little attention to patterns of words, which encode rich semantic information and impose semantic constraints on a word’s context. This paper explores the feasibility of incorporating word embedding with pattern grammar, a grammar model to describe the syntactic environment of lexical items. Specifically, this research develops a method to extract patterns with semantic information of word embedding and investigates the statistical regularities and distributional semantics of the extracted patterns. The major results of this paper are as follows. Experiments on the LCMC Chinese corpus reveal that the frequency of patterns follows Zipf’s hypothesis, and the frequency and pattern length are inversely related. Therefore, the proposed method enables the study of distributional properties of patterns in large-scale corpora. Furthermore, experiments illustrate that our extracted patterns impose semantic constraints on context, proving that patterns encode rich semantic and contextual information. This sheds light on the potential applications of pattern-based word embedding in a wide range of natural language processing tasks.
Dependency Distances and Their Frequencies in Indo-European Language
Xinying Chen, Department of Czech Language, University of Ostrava, Ostrava, Czech Republic;b School of Foreign Studies, Xi’an Jiaotong University, ChinaKim Gerdes, LPP (CNRS); Institute of General and Applied Linguistics and Phonetics, Sorbonne Nouvelle, France;e Almanach (Inria)Abstract
The present study investigates the relationship between two features of dependencies, namely, dependency distances and dependency frequencies. The study is based on the analysis of a parallel dependency treebank that includes 10 Indo-European languages. Two corresponding random dependency treebanks are generated as baselines for comparison. After computing the values of dependency distances and their frequencies in these treebanks, for each lan-guage, we fit four functions, namely quadratic, exponent, logarithm, and power-law func-tions, to its original and random datasets. The preliminary result shows that there is a rela-tion between the two dependency features for all 10 Indo-European languages. The relation can be further formalized as a power-law function which can distinguish the observed data from randomly generated datasets.
Quantifying Perceived Political Bias of Newspapers through a Document Classification Technique
Hyungsuc Kang , Department of Newmedia, Seoul Media Institute of Technology, Seoul, KoreaJanghoon Yang, Department of Newmedia, Seoul Media Institute of Technology, Seoul, KoreaAbstract
Even though a certain degree of political bias is unavoidable in the media, strong media bias is likely to have an impact on society, especially on the formation of public opinion. This research proposes a data-driven method for quantifying political bias of media contents. With a document classification technique called doc2vec and social data from Facebook posts, a model for analysing the bias is developed. By applying the model to contents of major South Korean newspapers, this paper demonstrates quantitatively that significant political bias exists in the newspapers in line with the perceived political bias.
The Effect of Translation on Text Coherence: A Quantitative Study
Elham Najafi, Department of Physics, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, IranAlireza Valizadeh, Department of Physics, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, IranAmir H. Darooneh, Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario, Canada;c Department of Physics, University of Zanjan, Zanjan, IranAbstract
Investigating the coherence of translated texts is an important issue in multilingual studies. In this paper, we aim to study text coherence in human translated texts and its relation to the text properties by a quantitative approach. For this purpose, we assigned a word importance value to each word-type of a text and construct the text ‘importance time series’ from the original and translated texts. Then, we calculated text global coherence by applying Detrended Fluctuation Analysis (DFA) to these time series. By means of this procedure, we were able to compare the coherence of the original and translated texts. Our results show that a translation does not always decrease text coherence, as many people may suppose; there are many cases where text coherence is increased by translation. We also studied the relation of text coherence and the text properties such as text size or vocabulary size; we observed no relevance. Our findings suggest that the coherence of a text depends on the translator’s abilities rather than the state of being original or translated.
Optimal Coding and the Origins of Zipfian Laws
Ramon Ferrer-i-Cancho, Complexity & Quantitative Linguistics Lab, LARCA Research Group, Departament de Ciències de la Computació, Universitat Politècnica de Catalunya, Barcelona, SpainChristian Bentz, URPP Language and Space, University of Zürich, Zürich, Switzerland;c DFG Center for Advanced Studies “Words, Bones, Genes, Tools”, University of Tübingen, Tübingen, GermanyCaio Seguin, Melbourne Neuropsychiatry Centre, The University of Melbourne and Melbourne Health, Melbourne, AustraliaAbstract
The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.
Frequency, Dispersion and Abstractness in the Lexical Sophistication Analysis of A Learner-Based Word Bank: Dimensionality Reduction and Identification
Haomin Zhang, Faculty of Education, East China Normal University, Shanghai, China;Yuting Han, Faculty of Education, East China Normal University, Shanghai, China;Xing Zhang, Faculty of Education, East China Normal University, Shanghai, China;Liuran Cui, Faculty of Education, East China Normal University, Shanghai, China;Abstract
The current study incorporated a number of lexical sophistication indices including frequency, dispersion and abstractness of words. A learner-based word bank (inclusive of a Chinese middle-school vocabulary list, a Chinese high-school vocabulary list and a Chinese college-English-test vocabulary list) was manually coded based on two existing corpora: Corpus of Contemporary American English (COCA) and British National Corpus (BNC). Indices of frequency, dispersion and abstractness of the word bank were analysed to shed light on the predetermined categorization of lexical sophistication among second language learners. Based on the principal component analysis, the results demonstrated that dispersion was a unique factor loaded on all entered eight variables while word frequency and abstractness were extracted by the same factor in the learner-based word bank. Moreover, a follow-up MANOVA analysis with post hoc comparisons showed that lexical sophistication indices in general produced pronounced differences among the three levels of word lists. More critically, dispersion was found to be the only significant indicator to differentiate the three levels of word lists. Discussion centred on the uniqueness of dispersion in lexical sophistication and the shared algorithm in frequency and abstractness.
Predictive Modelling of Type Valency in Word Formation Grammar
Kateryna Krykoniuk, The School of English, Communication and Philosophy, Cardiff University, Cardiff, UKAbstract
This paper explores different regression models for predicting the type valency of Persian suffixes within a usage-based approach. Usage-based models treat the type frequency of a suffix as a key predictor for its type valency revealing that an increase in the type frequency leads to a greater combining power between a construction’s paradigmatic elements. However, this effect is limited to a certain degree by the potential productivity of a suffix, as inferred from the statistically distinguishable negative correlation between the type valency and the potential productivity, as well as from the statistical significance of the variable of the number of hapaxes and the potential productivity in the regression models of conditional inference trees. Moreover, polyvalency as a distinct feature of Persian derivation implies a number of other characteristics, namely greater morphological diversity of patterns, parsability, semantic transparency and larger conversion power of morphemes. This is contrasted with English whose morphemes are predominantly type-monovalent.
Linguistic Accommodation in Teenagers’ Social Media Writing: Convergence Patterns in Mixed-gender Conversations
Lisa Hilte, CLiPS Research Center, University of Antwerp, Antwerp, BelgiumReinhild Vandekerckhove, CLiPS Research Center, University of Antwerp, Antwerp, BelgiumWalter Daelemans, CLiPS Research Center, University of Antwerp, Antwerp, BelgiumAbstract
The present study analyzes the phenomenon of linguistic accommodation, i.e. the adaptation of one’s language use to that of one’s conversation partner. In a large corpus of private social media messages, we compare Flemish teenagers’ writing in two conversational settings: same-gender (including only boys or only girls) and mixed-gender conversations (including at least one girl and one boy). We examine whether boys adopt a more ‘female’ and girls a more ‘male’ writing style in mixed-gender talks, i.e. whether teenagers converge towards their conversation partner with respect to gendered writing. The analyses focus on two sets of prototypical markers of informal online writing, for which a clear gender divide has been attested in previous research: expressive typographic markers (e.g., emoticons), which can be considered more ‘female’ features, and ‘oral’, speech-like markers (e.g., regional language features), which are generally more popular among boys. Using generalized linear-mixed models, we examine the frequency of these features in boys’ and girls’ writing in same- versus mixed-gender conversations.Patterns of convergence emerge from the data: they reveal that girls and boys adopt a more similar style in mixed-gender talks. Strikingly, the convergence is asymmetrical and only significant for a particular group of online language features.期刊简介
The Journal of Quantitative Linguistics is interested in work which systematically applies or develops mathematical and/or statistical concepts and methods to theoretical understanding of language phenomena. This covers the range of synchronic and diachronic subdomains of linguistic theory, including contemporary and historical linguistics, sociolinguistics and dialectology, and cognitive, neural, and psycholinguistics as well as the various levels of analysis from phonetics through phonology, morphology, syntax, semantics, and pragmatics. The introduction of mathematical and statistical concepts and methods from the natural sciences, economics, and cognitive science is particularly encouraged, as is philosophical reflection on the relationship of quantitative linguistics as here understood to these other sciences.
《计量语言学》杂志关注系统应用或发展数学、统计学概念和方法,从理论层面理解语言现象的工作。杂志涵盖了语言学理论的历时和共时子领域,包括当代和历史语言学、社会语言学和方言学、认知、神经和心理语言学,以及从语音学到音系学、形态学、句法学、语义学和语用学的各个分析层面。杂志特别鼓励引入自然科学、经济学和认知科学中的数学和统计学概念和方法,以及对所理解的计量语言学与其他科学的关系进行哲学层面思考。
官网地址:
https://academic.oup.com/applij
本文来源:Journal of Quantitative Linguistics官网
课程推荐
2022-10-29
2022-10-28
2022-10-27
2022-10-25
2022-10-24
2022-10-23
2022-10-22
2022-10-21
2022-10-19
2022-10-18
2022-10-17
2022-10-16
“语言学心得交流分享群”
“语言学考博/考研/保研交流群”请添加“心得君”入群请务必备注“学校+研究方向/专业”
今日小编:神厨小福贵
审 核:心得君
转载&合作请联系
"心得君"
微信:xindejun_yyxxd
点击“阅读原文”可跳转下载