查看原文
其他

刊讯|SSCI 期刊 Language Testing 2022第1期

四万学者关注了→ 语言学心得 2022-06-09

LANGUAGE TESTING

Volume 39, Issue 1, January 2022

LANGUAGE TESTING(SSCI一区,2020 IF:3.551)2022年第1期共发文8篇,其中"Editorial"1篇,研究性论文6篇,"Test Review"1篇。内容涉及会话分析、二语习得、口译、测试评估、认知负荷理论、标准制定、贝叶斯验证性因子分析等。

目录


EDITORIAL

■ Innovation and expansion in Language Testing for changing times, by Luke Harding, Paula Winke, Pages 3–6.


ARTICLES

■ What scores from monologic speaking tests can(not) tell us about interactional competence, by Carsten Roever, Naoki Ikeda, Pages 7–29.

■ Interpreting testing and assessment: A state-of-the-art review, by Chao Han, Pages 30–55.

■ What can gaze behaviors, neuroimaging data, and test 56 scores tell us about test method effects and cognitive load in listening assessments?, by Vahid Aryadoust, Stacy Foo, Li Ying Ng, Pages 56–89.

■ Developing individualized feedback for listening assessment: Combining standard setting and cognitive diagnostic assessment approaches, by Shangchao Min, Lianzhen He, Pages 90–116.

■ The domain expert perspective: A qualitative study into the views expressed in a standard-setting exercise on a language for specific purposes (LSP) test for health professionals, by Simon Davidson, Pages 117–141.

■ Examining the factor structure and its replicability across multiple listening test forms: Validity evidence for the Michigan English Test, by Tingting Liu, Vahid Aryadoust, Stacy Foo, Pages 142–171.


TESET REVIEWS

■ Aptis test review, by Ji-young Shin, Rodrigo A. Rodríguez-Fuentes, Aleksandra M. Swatek, April Ginther, Pages 172–187.

摘要

What scores from monologic speaking tests can(not) tell us about interactional competence

Carsten Roever, University of Melbourne, Australia

Naoki Ikeda, University of Melbourne, Australia

Abstract The overarching aim of the study is to explore the extent to which test takers’ performances on monologic speaking tasks provide information about their interactional competence. This is an important concern from a test use perspective, as stakeholders tend to consider test scores as providing comprehensive information about all aspects of L2 competence. One hundred and fifty test takers completed a TOEFL iBT speaking section consisting of six monologic tasks, measuring speaking proficiency, followed by a test of interactional competence with three monologues and three dialogues, measuring pragmalinguistic skills, the ability to recipient design extended discourse, and interactional management skills. Quantitative analyses showed a medium to high correlation between TOEFL iBT speaking scores and interactional scores of r = .76, though with a much lower correlation of r = .57 for the subsample most similar to a typical TOEFL population. There was a large amount of variation in interactional scores for test takers at the same TOEFL iBT speaking score level, and qualitative analyses demonstrated that test takers’ ability to recipient design their talk and format social actions appropriate to social roles and relationships was not well captured by speaking scores. We suggest potential improvements.

Key words Conversation analysis, English as a second language, interactional competence, speaking test, TOEFL


Interpreting testing and assessment: A state-of-the-art review

    Chao Han

Abstract Over the past decade, testing and assessing spoken-language interpreting has garnered an increasing amount of attention from stakeholders in interpreter education, professional certification, and interpreting research. This is because in these fields assessment results provide a critical evidential basis for high-stakes decisions, such as the selection of prospective students, the certification of interpreters, and the confirmation/refutation of research hypotheses. However, few reviews exist providing a comprehensive mapping of relevant practice and research. The present article therefore aims to offer a state-of-the-art review, summarizing the existing literature and discovering potential lacunae. In particular, the article first provides an overview of interpreting ability/competence and relevant research, followed by main testing and assessment practice (e.g., assessment tasks, assessment criteria, scoring methods, specificities of scoring operationalization), with a focus on operational diversity and psychometric properties. Second, the review describes a limited yet steadily growing body of empirical research that examines rater-mediated interpreting assessment, and casts light on automatic assessment as an emerging research topic. Third, the review discusses epistemological, psychometric, and practical challenges facing interpreting testers. Finally, it identifies future directions that could address the challenges arising from fast-changing pedagogical, educational, and professional landscapes.

Key words Interpreting studies, reliability, spoken-language interpreting, testing and assessment, validity


What can gaze behaviors, neuroimaging data, and test 56 scores tell us about test method effects and cognitive load in listening assessments?

Vahid Aryadoust, National Institute of Education, Nanyang Technological University, Singapore

Stacy Foo, National Institute of Education, Nanyang Technological University, Singapore

Li Ying Ng, National Institute of Education, Nanyang Technological University, Singapore

Abstract The aim of this study was to investigate how test methods affect listening test takers’ performance and cognitive load. Test methods were defined and operationalized as while-listening performance (WLP) and post-listening performance (PLP) formats. To achieve the goal of the study, we examined test takers’ (N = 80) brain activity patterns (measured by functional near-infrared spectroscopy (fNIRS)), gaze behaviors (measured by eye-tracking), and listening performance (measured by test scores) across the two test methods. We found that the test takers displayed lower activity levels across brain regions supporting comprehension during the WLP tests relative to the PLP tests. Additionally, the gaze behavioral patterns exhibited during the WLP tests suggested that the test takers adopted keyword matching and “shallow listening.” Together, the neuroimaging and gaze behavioral data indicated that the WLP tests imposed a lower cognitive load on the test takers than the PLP tests. However, the test takers performed better with higher test scores for one of two WLP tests compared with the PLP tests. By incorporating eye-tracking and neuroimaging in this exploration, this study has advanced the current knowledge on cognitive load and the impact imposed by different listening test methods. To advance our knowledge of test validity, other researchers could adopt our research protocol and focus on extending the test method framework used in this study.

Key words Cognitive load theory, construct-irrelevant variance, functional near-infrared spectroscopy, gaze behavior, keyword matching, listening test


Developing individualized feedback for listening assessment: Combining standard setting and cognitive diagnostic assessment approaches

Shangchao Min, Zhejiang University, China

Lianzhen He, Zhejiang University, China

Abstract In this study, we present the development of individualized feedback for a large-scale listening assessment by combining standard setting and cognitive diagnostic assessment (CDA) approaches. We used the performance data from 3,358 students’ item-level responses to a field test of a national EFL test primarily intended for tertiary-level EFL learners. The results showed that proficiency classifications and subskill mastery classifications were generally of acceptable reliability, and the two kinds of classifications were in alignment with each other at individual and group levels. The outcome of the study is a set of descriptors that describe each test taker’s ability to understand certain level of oral texts and his or her cognitive performance. The current study, by illustrating the feasibility of combining standard setting and CDA approaches to produce individualized feedback, contributes to the enhancement of score reporting and addresses the long-standing criticism that large-scale language assessments fail to provide individualized feedback to link assessment with instruction.

Key words Cognitive diagnosis, individualized feedback, listening assessment, score report, standard setting


The domain expert perspective: A qualitative study into the views expressed in a standard-setting exercise on a language for specific purposes (LSP) test for health professionals

Simon Davidson

Abstract This paper investigates what matters to medical domain experts when setting standards on a language for specific purposes (LSP) English proficiency test: the Occupational English Test’s (OET) writing sub-test. The study explores what standard-setting participants value when making performance judgements about test candidates’ writing responses, and the extent to which their decisions are language-based and align with the OET writing sub-test criteria. Qualitative data is a relatively under-utilized component of standard setting and this type of commentary was garnered to gain a better understanding of the basis for performance decisions. Eighteen doctors were recruited for standard-setting workshops. To gain further insight, verbal reports in the form of a think-aloud protocol (TAP) were employed with five of the 18 participants. The doctors’ comments were thematically coded and the analysis showed that participants’ standard-setting judgements often aligned with the OET writing sub-test criteria. An overarching theme, ‘Audience Recognition,’ was also identified as valuable to participants. A minority of decisions were swayed by features outside the OET’s communicative construct (e.g., clinical competency). Yet, overall, findings indicated that domain experts were undeniably focused on textual features associated with what the test is designed to assess and their views were vitally important in the standard-setting process.

Key words Analytic Judgement Method, domain expert, health communication, language for specific purpose (LSP) test, Occupational English Test, standard setting


Examining the factor structure and its replicability across multiple listening test forms: Validity evidence for the Michigan English Test

Tingting Liu, Sichuan International Studies University, ChinaNational Institute of Education, Nanyang Technological University, Singapore

Vahid Aryadoust, National Institute of Education, Nanyang Technological University, Singapore

Stacy Foo, National Institute of Education, Nanyang Technological University, Singapore

Abstract This study evaluated the validity of the Michigan English Test (MET) Listening Section by investigating its underlying factor structure and the replicability of its factor structure across multiple test forms. Data from 3255 test takers across four forms of the MET Listening Section were used. To investigate the factor structure, each form was fitted with four Bayesian confirmatory factor analysis (CFA) models: (1) a three correlated-factor model, (2) a bi-factor model, (3) a higher-order factor model, and (4) a single general-factor model. In addition, a four-pronged heuristic comprising construct delineation, construct operationalization, factor structure analysis, and congruence coefficient was developed to examine the replicability of factor structures across the test forms. Results from the CFA models showed that the test forms were unidimensional and the four-pronged heuristic indicated that the test construct was consistently operationalized across forms. Furthermore, the congruence coefficient indicated that the factor structure representing listening was highly similar and replicable across test forms. In sum, the construct of the MET Listening Section did not comprise divisible subskills. Yet, the unidimensional factor structure of the test was replicable across the test forms.

Key words Bayesian confirmatory factor analysis, factor structure, four-pronged heuristic, listening comprehension, psychological dimensionality, psychometric dimensionality, replicability, structural validity, subskill


期刊简介

Language Testing is an international peer reviewed journal that publishes original research on foreign, second, additional, and bi-/multi-/trans-lingual (henceforth collectively called L2) language testing, assessment, and evaluation. Since 1984 it has featured high impact L2 testing papers covering theoretical issues, empirical studies, and reviews. The journal's scope encompasses the testing, assessment, and evaluation of spoken and signed languages being learned as L2s by children and adults, and the use of tests as research and evaluation tools that are used to provide information on the language knowledge and language performance abilities of L2 learners. Many articles also contribute to methodological innovation and the practical improvement of L2 testing internationally. In addition, the journal publishes submissions that deal with L2 testing policy issues, including the use of tests for making high-stakes decisions about L2 learners in fields as diverse as education, employment, and international mobility.


《语言测试》是一份国际同行评审期刊,发表关于外国、第二、辅助和双/多/跨语言(以下统称为 L2)语言测试、评估和评估的原创研究。自 1984 年以来,它以高影响力的 L2 测试论文为特色,涵盖理论问题、实证研究和评论。该期刊的范围包括对儿童和成人作为 L2 学习的口语和手语的测试和评估,以及使用测试作为研究和评估工具,用于提供有关语言知识和语言表现的信息L2 学习者的能力。许多文章还为国际上二语测试的方法创新和实际改进做出了贡献。此外,该期刊还发表处理 L2 测试政策问题的论文,包括使用测试对 L2 学习者在教育、就业和国际流动等不同领域做出高风险决策。


The journal welcomes the submission of papers that deal with ethical and philosophical issues in L2 testing, as well as issues centering on L2 test design, validation, and technical matters. Also of concern is research into the washback and impact of L2 language test use, the consequences of testing on L2 learner groups, and ground-breaking uses of assessments for L2 learning. Additionally, the journal wishes to publish replication studies that help to embed and extend knowledge of generalisable findings in the field. Language Testing is committed to encouraging interdisciplinary research, and is keen to receive submissions which draw on current theory and methodology from different areas within second language acquisition, applied linguistics, educational measurement, psycholinguistics, general education, psychology, cognitive science, language policy, and other relevant subdisciplines that interface with language testing and assessment. Authors are encouraged to adhere to Open Science Initiatives.


该期刊欢迎提交涉及 L2 测试中的伦理和哲学问题的论文,以及以 L2 测试设计、验证和技术问题为中心的问题。同样值得关注的是对 L2 语言测试使用的反作用和影响、测试对 L2 学习者群体的影响以及 L2 学习评估的开创性使用的研究。此外,该杂志希望发表有助于贡献和扩展该领域可推广发现的知识的重复研究。《语言测试》 致力于鼓励跨学科研究,并接收来自二语习得、应用语言学、教育测量、心理语言学、通识教育、心理学、认知科学、语言政策和与语言测试和评估相关的其他相关子学科研究。鼓励作者遵守开放科学倡议。


官网地址:

https://journals.sagepub.com/home/ltj


本文来源:LANGUAGE TESTING官网



往期推荐

讯  息|《第二语言学习研究》稿约


讯  息|《中文教学与研究》稿约


刊讯|SSCI 期刊《语言学习与技术》2021年第3期


刊讯|SSCI 期刊 TESOL Quarterly 2021年第4期


欢迎加入
“语言学心得交流分享群”“语言学考博/考研/保研交流群”


请添加“心得君”入群

 今日小编:木    子

 审    核:心得小蔓

转载&合作请联系

"心得君"

微信:xindejun_yyxxd

点击“阅读原文”可跳转下载

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存