抢鲜读 | 关于语言测评的常见误区（上） | 自由微信

抢鲜读 | 关于语言测评的常见误区（上）

外语学术科研网 2021-03-17

SOME COMMON MISCONCEPTIONS AND UNREALISTIC EXPECTATIONS ABOUT LANGUAGE ASSESSMENT

We’ve found that many people who need to use language assessments in the real world have misconceptions and unrealistic expectations about what language assessments can do and what they should be like. These often prevent people from becoming competent in language assessment. Furthermore, there is often a belief that “language testers” have some almost magical procedures and formulae for creating the “best” test. These misconceptions and unrealistic expectations, and the mystique associated with language testing, constitute strong affective barriers to many people who want and need to be able to use language assessments in their professional work. Breaking down this affective barrier by dispelling and clarifying misconceptions, helping readers develop a sense of what can reasonably be expected of language assessments, and demystifying language testing is thus an important goal of this book.

An Illustrative ExamplePerhaps the best way to illustrate these misconceptions and unrealistic expectations is with an example from our own experience with language testing. We first started working together in language testing nearly forty years ago, when we were in situations in which we needed to develop language tests for a particular use. We were both involved in developing tests for use in placing students into an appropriate level or group in English as a foreign language (EFL) courses in tertiary-level institutions in a country where, at that time, English was not the medium of instruction for education, and was not widely used in the society at large.
One of us was working in an English department at a major university in which all students were required to satisfy a foreign language requirement, and most students did so with English. English was taught non-intensively for three hours per week. It was taught primarily to prepare students to develop an ability to read material in English that would help them further their educational goals. The teachers consisted of both native and second language (L2) speakers of English who had extensive training in teaching EFL. The program was well funded and had the resources to hire a testing specialist to help them with assessment development.
The other was working at a national language center, where junior university faculty members were being trained in English, in preparation for studying for advanced degrees at institutions in countries where English was the medium of education. The English program was an intensive one, with students attending classes for eight hours per day, five days per week, for ten weeks. It was also partly English for specific purposes, with the classes in reading and writing focusing on differing disciplines, such as agriculture, economics, and education. The teachers were all native English speakers who had had extensive training, and were using what was considered to be the current “best” methodology. The center was well funded by an international educational foundation, so all kinds of learning materials—books, magazines, audio tapes—were readily available for the students to use.
We had come to the task with different backgrounds—one in theoretical linguistics and the other in English language and literature. One of us had had six years’ practical experience working in a large-scale testing center at a university under the mentorship of one of the top language testers in the field at the time, while the other had had a brief encounter with assessment while helping to develop an achievement test for a large lecture course at a university. Neither of us, however, had had any formal training in either language testing or psychometrics. We had both had practical experience in teaching English as a second/foreign language, and considerable understanding of what was then known, in terms of theory and research, of second/foreign language learning and teaching. In addition, we shared a common concern: to develop the “best” test for our situations. We believed that there was a model language test and a set of straightforward procedures—a recipe, if you will—that we could follow to create a test that would be the best one for our intended uses and situations.
What we did, essentially, was to model our tests on the large-scale EFL tests that were widely used at that time, which included sections testing English grammar, vocabulary, reading, and listening comprehension. Following this model, we employed test development procedures that had been developed for psychological and educational tests to produce, rather mechanically, tests that both we and our colleagues believed were “state-of-the-art” EFL tests, and hence the “best” for our needs. We had started with the “best” models and had used sophisticated statistical techniques in test development, so that our tests were definitely state-of-the-art at that time, but now, in retrospect, we wonder whether they were the best for those situations. Indeed, we wonder if there is a single “best” test for any language testing situation.
In developing those tests, we believed that if we followed the model of a test that was widely recognized and used, it would automatically be useful for our own particular needs. These tests had been developed by the “experts” in the field, who were assumed to know more than we did. There were, however, several questions we did not ask. Were our situations similar enough to the ones for which these large-scale tests were developed to make them appropriate?
Were our test takers like the ones who took those large-scale tests, and would the results of our tests be used to make the same kinds of decisions? We did not even ask whether the abilities tested in those tests were the ones we needed to test. Nor did we have any comprehensive, systematic way to think about the nature of language use.
Given what was known (and not known) about the nature of language use, of language learning, and of language testing at that time, these were questions that simply never occurred to us. Language ability was viewed as a set of finite components—grammar, vocabulary, pronunciation, spelling—that were realized as four skills—listening, speaking, reading, and writing. If we taught or tested these, we were teaching or testing everything that was needed. Language learners were viewed as organisms who all learned language by essentially the same processes—stimulus and response—as described by behaviorist psychology.
Finally, it was assumed that the processes involved in language learning were more or less the same for all learners, for all situations, and for all uses. It is not surprising, then, that we believed that a single model would provide the best test for our particular test takers, for our particular uses, and for the areas of language ability that were of interest in our particular situation.

As it turned out, the two groups of test takers for whom we developed essentially the same kind of language test were quite different. The university group consisted of first-year students entering a university in which very little of their academic course work would involve the use of English. Most of them would be required to take at least one English course as part of their degree requirements. Though all of the students had had some exposure to English in their secondary school education, most had very little control of the language, and almost none of them had had any exposure to English outside of the EFL classroom. Few had ever spoken English with a native speaker or had had the opportunity to use English for any non-instructional purpose.

The other group consisted of university teachers from many different universities, and representing a wide range of academic disciplines, who had been selected as recipients of scholarships to continue work on advanced degrees in countries where English was the medium of instruction. They were much more highly specialized in their knowledge of their disciplines than were the first-year university students, were considerably older, on average, and were more experienced. They were also highly motivated to improve their English.

The programs into which these test takers would be placed by means of the tests were also quite different. The program into which the university students would be placed consisted of four levels of non-intensive (three hours per week) English instruction during their first and second years of university work. The program focused primarily on enabling the students to read academic reference works written in English. Students were placed in courses at one of the four levels by general ability level and not according to their area of academic specialization. Most of the English classes were taught by teachers who had learned English as a foreign language, and much of the classroom instruction was carried out in the students’ native language.

The university teachers, on the other hand, would be placed into a ten week intensive (forty hours per week) course at a national English language institute where they would be required to speak nothing but English between the hours of about eight and five every working day. They would take classes in all four skills, but would be divided into groups according to broad classifications of their academic disciplines, such as agriculture, engineering and sciences, medical sciences, and economics. Unlike the university English program, the teachers in this program were all native speakers of English, and all classroom instruction was carried out in English. This program was thus much more intensive than that of the university students: the curriculum was focused on English for specific purposes and involved a great deal more actual use of English.

未完待续

本文节选自《语言测评实践：现实世界中的测试开发与使用论证》（Language Assessment in Practice: Developing Language Assessments and Justifying Their Use in the Real World）一书。Bachman, L. & Palmer, A. (2016). Language Assessment in Practice: Developing Language Assessments and Justifying Their Use in the Real World. Beijing: Foreign Language Teaching and Research Press.
内容提要：本书系语言测评领域权威学者Bachman教授的代表作之一，系统地阐述了指导语言测试开发与使用的框架。全书分三部分，第一部分重点呈现语言测试开发与使用的理论基础；第二部分详细描述如何在语言测试开发的起始阶段构建“测试使用论证”框架；第三部分深入剖析如何在现实世界中开发和使用语言测试。本书是近年来语言测试领域不可多得的著作，代表了语言测试的最新发展，将会对语言测试的设计、开发、使用乃至研究带来深远的影响。对语言测评领域的研究生、教师、教师培训者、研究者及考试机构从业人员都具有很高的参考价值。

【声明】感谢本书作者授权外研社刊载本文。其他任何学术平台若有转载需要，可致电010-88819585或发送邮件至research@fltrp.com，我们将帮您联系原文作者协商授权事宜，请勿擅自转载。（*封面图片来自网络）

警察殴打打人学生，舆论撕裂的背后

你手放哪呢，出生啊

薅广电羊毛！100元话费实付94.6元，还有电费96.9充100元！招团长~

警察踢打校园欺凌者：当事人不愿返校，派出所拒收锦旗

疯传！广州地铁突发！警方介入