查看原文
其他

刊讯|SSCI 期刊《计量语言学》2023年第1-2期

六万学者关注了→ 语言学心得 2024-02-19

Journal of Quantitative Linguistics

Volume 30, Issue 1-2, 2023

Journal of Quantitative Linguistics(SSCI二区,2022 IF:1.4,排名:87/194)2023年第1-2期共发文11篇。其中,2023年第1期共发研究性论文7篇。研究论文涉及RA的跨学科信息特征、基于占用问题类型指数的作者归属、屈折系统的功能属性、语音特征参数化及语音距离的数值计算、波兰语文体风格计量、词汇流行度计算、波兰语词汇历时变化逻辑回归。2023年第2期共发研究性论文4篇。研究论文涉及词汇分散度基于词型和词例的词长分布统一模型语体学科副词分布研究、汉英词汇结构的协同特性。欢迎扩散转发!

往期推荐:

刊讯|SSCI 期刊《计量语言学》2022年第3-4期

刊讯|SSCI 期刊《计量语言学》2022年第1-2期

目录


ARTICLES

Issue1

■ To Move or Not to Move: An Entropy-based Approach to the Informativeness of Research Article Abstracts across Disciplines, by Wei Xiao, Li Li & Jin Liu, Pages 1–26.

■ Authorship Attribution via Occupancy-problem-type Indices, by Lukun Zheng, Huiqiang Zheng & Chandra Kundu, Pages 27–41.

■ The Entropy of Morphological Systems in Natural Languages Is Modulated by Functional and Semantic Properties, by Francesca Franzon & Chiara Zanini, Pages 42–66.

■ Unified Parametrization of Phonetic Features and Numerical Calculation of Phonetic Distances between Speech Sounds, by Maksym O. Vakulenko, Pages 67–85.

■ Stylistic Fingerprints, POS-tags, and Inflected Languages: A Case Study in Polish, by Maciej Eder & Rafał L. Górski, Pages 86–103.

■ Word Use Equivalence and Hierarchical Word Tiers, by Brent Burch & Jesse Egbert, Pages 104–124.

■Modelling the Dynamics of Language Change: Logistic Regression, Piotrowski’s Law, and a Handful of Examples in Polish, by Rafał L. Górski & Maciej Eder, Pages 125–151.


Issue2

■ Too Noisy at the Bottom: Why Gries’ (2008, 2020) Dispersion Measures Cannot Identify Unbiased Distributions of Words, by Robert N. Nelson, Pages 153–166.

■ Unifying Models for Word Length Distributions Based on Types and Tokens, by Peter Zörnig & Thomas Berg, Pages 167–182.

■ A Corpus-Based Study of the Distributions of Adnominals Across Registers and Disciplines, by Yiyang Hu & Qingshun He, Pages 183–205.

■ Synergetic Properties of Lexical Structures in Chinese and English, by Jieqiang Zhu & Jingyang Jiang, Pages 204–230.

摘要

To Move or Not to Move: An Entropy-based Approach to the Informativeness of Research Article Abstracts across Disciplines

Wei Xiao, Research Center for Language, Cognition and Language Application, Chongqing University, Chongqing, China

Li Li, School of Foreign Languages and Cultures, Chongqing University, Chongqing, China

Jin Liu, School of Foreign Languages and Cultures, Chongqing University, Chongqing, China

Abstract Research article (RA) abstracts succinctly and skilfully epitomize the core information of the full text and have thus attracted the attention of a number of scholars. While previous studies mainly focused on the rhetorical structures, meta-discursive features and lexico-grammatical features, few have made explorations from the perspective of information theory. To bridge this gap, the present study conducted an entropy-based analysis to explore the distribution pattern of information content across moves and the variations across disciplines. 318 RA abstracts across the natural sciences, social sciences and humanities (106 abstracts per discipline) were selected and three indices, i.e. the 1-/ 2-/ 3-gram entropies, were used to examine whether different indices yielded different features. The results show that in an RA abstract, the information content is unevenly distributed across moves; different entropy indices may reflect different linguistic properties; and both similarities and variations exist in information content across disciplines. These phenomena can be attributed to the functions of moves, the linguistic meanings of indices and disciplinary features. This study has implications for RA abstract writing instruction and practice, as well as for broadening the applications of quantitative linguistic methods into less touched fields.


Authorship Attribution via Occupancy-problem-type Indices

Lukun Zheng, Department of Mathematics, Western Kentucky University, Bowling Green, KY, USA

Huiqiang Zheng, Department of Asian and Slavic Languages and Literatures, University of Iowa, Iowa City, IA, USA

Chandra Kundu, Department of Mathematics, University of Central Florida, Orlando, FL, USA

Abstract In this paper, we propose a new methodology for authorship attribution based on a profile of indices related to the occupancy problem, called occupancy- problem indices. The occupancy problem has a long history and is an important example in standard textbooks like Feller (1971). We base our methodology on function words. We establish a testing procedure by constructing a confidence band of the occupancy-problem indices using the sampling distribution of the number of distinct function words. We validate our proposed methodology using controlled and constructed writing samples whose authorship is known. We then apply this methodology to explore the question of who wrote the 15th Oz book, which has a disputing authorship between Lyman Frank Baum (1856– 1919) and his successor Ruth Plumly Thompson (1891–1976) on the Oz series.


The Entropy of Morphological Systems in Natural Languages Is Modulated by Functional and Semantic Properties

Francesca Franzon, Neuroscience Area, International School for Advanced Studies (SISSA), Trieste, Italy

Chiara Zanini, Romanisches Seminar, (RoSe), Universität Zürich, Zürich, Switzerland

Abstract In most natural languages, grammatical gender and number features encode semantic attributes concerning animacy, sex, and numerosity. Despite the likely advantage of promptly communicating about such salient attributes, inflec-tional systems rarely display consistently bijective correspondences between the semantic attributes and the grammatical feature values. In a study on Italian, we explored how this apparently noisy encoding depends on a trade- off between the semantic and the functional aspects of grammatical features. Using entropy metrics, we assessed the primarily functional purpose of gender and number features in the lexicon, observing a distribution of nouns that can optimally serve agreement-based parsing and prediction of words in sentences. A novel context entropy measure, introduced in this study to assess meaning specificity, revealed a semantic underspecification in masculine and singular nouns denoting animate referents. We argue that underspecification is the hallmark of the particular type of information compression occurring in inflec-tional systems. In binary inflectional systems, one value specifically encodes a semantic attribute, while the other value does not encode any semantic information, and surfaces as a default for functional purposes. By providing an information-theoretical account of the role of grammatical features, we set the basis for a scientifically informed pursue of language inclusiveness.


Unified Parametrization of Phonetic Features and Numerical Calculation of Phonetic Distances between Speech Sounds

Maksym O. Vakulenko, Institute of Problems of Artificial Intelligence, Kiev, Ukraine

Abstract A metric method to numerically measure phonetic and phonemic distances or contrasts, between speech sounds, is put forward. The feature values of the compared phones taken from the standard IPA charts are treated as indepen-dent parameters that give rise to corresponding Euclidean distances. As an illustration, the general phone set is mapped to Ukrainian phonemes. The proposed model agrees well with the historical linguistic facts and experimental phonetic data. The described approach may find its due applications in various fields of linguistics and speech technologies, including historical and typologi-cal linguistics, language acquisition, phonetic studies, computational phonol-ogy, machine translation, information retrieval, and text-to-speech conversion.


Stylistic Fingerprints, POS-tags, and Inflected Languages: A Case Study in Polish

Maciej Eder, Polish Academy od Sciences: Instytut Jezyka Polskiego Polskiej Akademii Nauk, Krakow, Poland

Rafał L. Górski, Faculty of Philology, Jagiellonian University, Krakow, Poland

Abstract In stylometric investigations, frequencies of the most frequent words (MFWs) and character n-grams outperform other style-markers, even if their perfor-mance varies significantly across languages. In inflected languages, word end-ings play a prominent role, and hence different word forms cannot be recognized using generic text tokenization. Countless inflected word forms make frequencies sparse, making most statistical procedures complicated. Presumably, applying one of the NLP techniques, such as lemmatization and/ or parsing, might increase the performance of classification. The aim of this paper is to examine the usefulness of grammatical features (as assessed via POS-tag n-grams) and lemmatized forms in recognizing authorial profiles, in order to address the underlying issue of the degree of freedom of choice within lexis and grammar. Using a corpus of Polish novels, we performed a series of supervised authorship attribution benchmarks, in order to compare the classi-fication accuracy for different types of lexical and syntactic style-markers. Even if the performance of POS-tags as well as lemmatized forms was notoriously worse than that of lexical markers, the difference was not substantial and never exceeded ca. 15%.


Word Use Equivalence and Hierarchical Word Tier

Brent Burch, Department of Mathematics and Statistics, Northern Arizona University, Flagstaff, AZ, USA

Jesse Egbert, Applied Linguistics Program, Department of English, Northern Arizona University, Flagstaff, AZ, USA

Abstract A ranked word list provides information about the position of each word in the list. However, retaining and employing the measure used to generate the ranked list can yield additional information about the words. If ω denotes the prevalence of a word in a corpus, then not only can the values of ω be ordered, their values can be compared to one another, and words having similar values can be grouped together into equivalence classes. Measures of word preva-lence include mean text frequency, the dispersion of words across texts in a corpus, or a measure that combines frequency and dispersion. In this paper, we examine the concepts of word equivalence classes and hierarchical word tiers and apply these concepts to the words in the British National Corpus (BNC). Hierarchical word tiers can be constructed without the knowledge of all pair-wise comparisons of the words under study. By grouping words that have similar values of prevalence, the ranked ordered list reduces to an informative set of hierarchical word tiers where each tier contains words that are similar to one another in terms of their use in the corpus.


Modelling the Dynamics of Language Change: Logistic Regression, Piotrowski’s Law, and a Handful of Examples in Polish

Rafał L. Górski, Polish Academy of Sciences, Institute of Polish Language, Krakow, Poland

Maciej EderPolish Academy of Sciences, Institute of Polish Language, Krakow, Poland

Abstract The study discusses modelling diachronic processes by logistic regression. The phenomenon of nonlinear changes in language was first observed by Raimund Piotrowski (hence labelled as Piotrowski’s law), even if actual linguistic evidence often speaks against using the notion of a ‘law’ in this context. In our study, we apply logistic regression models to changes which occurred between 15th and 18th century in the Polish language. The attested course of the majority of these changes closely follow the expected values, which proves that the language change might indeed resemble a nonlinear phase change scenario. We also extend the original Piotrowski’s approach by proposing polynomial logistic regression for these cases which can hardly be described by its standard version. Also, we propose to consider individual language change cases jointly, in order to inspect their possible collinearity or, more likely, their different dynamics in the function of time. Last but not least, we evaluate our results by testing the influence of the subcorpus size on the model’s goodness-of-fit.


Too Noisy at the Bottom: Why Gries’ (2008, 2020) Dispersion Measures Cannot Identify Unbiased Distributions of Words

Robert N. Nelson, Graduate College of Education, Temple University of Japan, Tokyo, Japan

Abstract Gries (2008, 2021) defined two dispersion measures able to alert corpus analysts to words that have a problematically limited distribution. Gries (2010, 2022) posited that these measures may additionally be relevant to language devel-opment research, as the learnability of a pattern may be predicted by the evenness of its distribution in corpora. However, both measures work by comparing vectors of observed and expected frequencies in partitioned cor-pora and this method cannot determine that a word is evenly distributed because it cannot distinguish the random noise inherent to an unbiased pro-cess from substantial non-random bias. An additional concern with the 2008 measure is raised: the 2008 measure is Manhattan distance scaled to the unit interval and, as such, it is extremely sensitive to the number of corpus parts because this choice sets the dimensionality of the measure space. In sum, this short analysis presents evidence that these measures should not be used to declare a pattern evenly distributed as neither can tell the difference between statistical noise and systematic bias.


Unifying Models for Word Length Distributions Based on Types and Tokens

Peter Zörnig, Department of Statistics, University of Brasília, Brasília, Brazil

Thomas Berg, Department of English, University of Hamburg, Hamburg, Germany

Abstract Word length studies have been one of the central issues in Quantitative Linguistics for a long time. Most models were constructed for very specific purposes, i.e. the individual models apply only to a specific language, only to token counts or only to type counts. The present paper takes up the challenge of developing unifying models which account for both type and token frequen-cies of a moderately large sample of languages (eight Indo-European and two non-Indo-European languages). We introduce three models which can be well fitted to all our data: the exponentiated Hyper-Poisson distribution, the general-ized gamma and the Sichel distribution. We also discuss the possibility of interpreting the model parameters linguistically.


A Corpus-Based Study of the Distributions of Adnominals Across Registers and Disciplines

Yiyang Hu, School of Foreign Languages, Sun Yat-sen University School of Foreign Languages, Sun Yat- sen University, Guangzhou, Guangdong, China

Qingshun He, School of Foreign Languages, Sun Yat-sen University School of Foreign Languages, Sun Yat- sen University, Guangzhou, Guangdong, China

Abstract Adnominals are an important resource of noun modification in written registers, especially in academic writing. This study compares the frequencies of adjecti-val adnominals and nominal adnominals across two registers (Fiction and Academic writing) by calculating T-values and conducting Welch’s t-tests on the adnominal subtypes. It is found that the preference for nominal adnominals exists in both the two registers and the mean frequencies of adjectival adnom-inals, premodifying nouns and postmodifying nouns increase as the register moves from Fiction to Academic writing. We further investigate the frequencies of adnominals in the research article abstracts across three disciplinary groups by conducting Welch’s ANOVA test. No significant difference is revealed in T-values in the research article abstracts across disciplines. The difference of adjectival adnominals, nouns as postmodifiers and appositive nouns lacks practical applications, while the effects of disciplines on the frequency of premodifying nouns cannot be rejected. It is the mean frequencies of premo-difying nouns that show the significant difference in the research article abstracts across disciplines. Premodifying nouns are more prevalent in hard science texts than in soft science texts.


Synergetic Properties of Lexical Structures in Chinese and English

Jieqiang Zhu, Department of Linguistics, School of International Studies, Zhejiang University, Hangzhou, China

Jingyang JiangDepartment of Linguistics, School of International Studies, Zhejiang University, Hangzhou, China

Abstract The synergetic lexical model provides a unique framework for exploration of the interrelationships between the lexical properties of languages. Previous studies concerning several properties of this lexical model have yielded many success-ful fittings results, but very few studies have investigated synonymy, a major property of this model. The present study uses 825 Chinese and 848 English tokens retrieved from Chinese and English corpora, dictionaries, and thesaurus to conduct a contrastive study on the interrelations between four major proper-ties of this lexical model: word length, word frequency, polysemy, and syno-nymy. The successful fittings of both languages demonstrate the cross- linguistic validity of the synergetic lexical model, though English belongs to the Germanic language family, while Chinese, a highly analytical language, is of the Sino-Tibetan language family. Moreover, our analysis of the parameters of the fitting results shows that, compared to English, Chinese possesses a greater resistance to shortening word length and a quicker response to semantic change.


期刊简介


The Journal of Quantitative Linguistics is interested in work which systematically applies or develops mathematical and/or statistical concepts and methods to theoretical understanding of language phenomena. This covers the range of synchronic and diachronic subdomains of linguistic theory, including contemporary and historical linguistics, sociolinguistics and dialectology, and cognitive, neural, and psycholinguistics as well as the various levels of analysis from phonetics through phonology, morphology, syntax, semantics, and pragmatics. The introduction of mathematical and statistical concepts and methods from the natural sciences, economics, and cognitive science is particularly encouraged, as is philosophical reflection on the relationship of quantitative linguistics as here understood to these other sciences.

《计量语言学》杂志关注系统应用或发展数学、统计学概念和方法,从理论层面理解语言现象的工作。杂志涵盖了语言学理论的历时和共时子领域,包括当代和历史语言学、社会语言学和方言学、认知、神经和心理语言学,以及从语音学到音系学、形态学、句法学、语义学和语用学的各个分析层面。杂志特别鼓励引入自然科学、经济学和认知科学中的数学和统计学概念和方法,以及对所理解的计量语言学与其他科学的关系进行哲学层面思考。


官网地址:

https://www.tandfonline.com/toc/njql20/30/2

本文来源:Journal of Quantitative Linguistics官网

点击文末“阅读原文”可跳转官网




推  荐




博学有道|CSC高校合作项目语言学申博交流会

2023-09-28

刊讯|SSCI 期刊《语言测试》2023年第1-2期

2023-09-28

刊讯|CSSCI 来源集刊《语言政策与规划研究》2023年第1期

2023-09-27

刊讯|SSCI 期刊《语言学习》2023年第3期

2023-09-26

刊讯|CSSCI 来源集刊《语言学研究》2022年第1-2期

2023-09-23

刊讯|《当代语言学》2023年第4期

2023-09-22

刊讯|SSCI 期刊《计算机辅助语言学习》2023年第5-6期

2023-09-21

刊讯|SSCI 期刊《语言、认知与神经科学》 2023年第1-7期

2023-09-18

刊讯|《汉语作为第二语言研究》2023年第1期

2023-09-17

刊讯|《语言科学》2023年第4期

2023-09-16

刊讯|SSCI 期刊《社会语言学》2023年第1-3期

2023-09-15

刊讯|CSSCI 来源集刊《汉语史学报》2023年第1期

2023-09-14

刊讯|SSCI 期刊《心智与语言》2023年第1-3期

2023-09-13

刊讯|《国际中文教育研究》2023年第1期(附稿约)

2023-09-12

刊讯|《汉语学习》2023年第4期

2023-09-10

刊讯|CSSCI 来源集刊《汉语史研究集刊》2022年第1-2期

2023-09-11


欢迎加入
“语言学心得交流分享群”“语言学考博/考研/保研交流群”


请添加“心得君”入群务必备注“学校/单位+研究方向/专业”

今日小编:leaf

  审     核:心得小蔓

转载&合作请联系

"心得君"

微信:xindejun_yyxxd

点击“阅读原文”可跳转下载

继续滑动看下一个

刊讯|SSCI 期刊《计量语言学》2023年第1-2期

六万学者关注了→ 语言学心得
向上滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存