抢鲜读 | 关于语言测评的常见误区(上)
One of us was working in an English department at a major university in which all students were required to satisfy a foreign language requirement, and most students did so with English. English was taught non-intensively for three hours per week. It was taught primarily to prepare students to develop an ability to read material in English that would help them further their educational goals. The teachers consisted of both native and second language (L2) speakers of English who had extensive training in teaching EFL. The program was well funded and had the resources to hire a testing specialist to help them with assessment development.
The other was working at a national language center, where junior university faculty members were being trained in English, in preparation for studying for advanced degrees at institutions in countries where English was the medium of education. The English program was an intensive one, with students attending classes for eight hours per day, five days per week, for ten weeks. It was also partly English for specific purposes, with the classes in reading and writing focusing on differing disciplines, such as agriculture, economics, and education. The teachers were all native English speakers who had had extensive training, and were using what was considered to be the current “best” methodology. The center was well funded by an international educational foundation, so all kinds of learning materials—books, magazines, audio tapes—were readily available for the students to use.
We had come to the task with different backgrounds—one in theoretical linguistics and the other in English language and literature. One of us had had six years’ practical experience working in a large-scale testing center at a university under the mentorship of one of the top language testers in the field at the time, while the other had had a brief encounter with assessment while helping to develop an achievement test for a large lecture course at a university. Neither of us, however, had had any formal training in either language testing or psychometrics. We had both had practical experience in teaching English as a second/foreign language, and considerable understanding of what was then known, in terms of theory and research, of second/foreign language learning and teaching. In addition, we shared a common concern: to develop the “best” test for our situations. We believed that there was a model language test and a set of straightforward procedures—a recipe, if you will—that we could follow to create a test that would be the best one for our intended uses and situations.
What we did, essentially, was to model our tests on the large-scale EFL tests that were widely used at that time, which included sections testing English grammar, vocabulary, reading, and listening comprehension. Following this model, we employed test development procedures that had been developed for psychological and educational tests to produce, rather mechanically, tests that both we and our colleagues believed were “state-of-the-art” EFL tests, and hence the “best” for our needs. We had started with the “best” models and had used sophisticated statistical techniques in test development, so that our tests were definitely state-of-the-art at that time, but now, in retrospect, we wonder whether they were the best for those situations. Indeed, we wonder if there is a single “best” test for any language testing situation.
In developing those tests, we believed that if we followed the model of a test that was widely recognized and used, it would automatically be useful for our own particular needs. These tests had been developed by the “experts” in the field, who were assumed to know more than we did. There were, however, several questions we did not ask. Were our situations similar enough to the ones for which these large-scale tests were developed to make them appropriate?
Were our test takers like the ones who took those large-scale tests, and would the results of our tests be used to make the same kinds of decisions? We did not even ask whether the abilities tested in those tests were the ones we needed to test. Nor did we have any comprehensive, systematic way to think about the nature of language use.
Given what was known (and not known) about the nature of language use, of language learning, and of language testing at that time, these were questions that simply never occurred to us. Language ability was viewed as a set of finite components—grammar, vocabulary, pronunciation, spelling—that were realized as four skills—listening, speaking, reading, and writing. If we taught or tested these, we were teaching or testing everything that was needed. Language learners were viewed as organisms who all learned language by essentially the same processes—stimulus and response—as described by behaviorist psychology.
Finally, it was assumed that the processes involved in language learning were more or less the same for all learners, for all situations, and for all uses. It is not surprising, then, that we believed that a single model would provide the best test for our particular test takers, for our particular uses, and for the areas of language ability that were of interest in our particular situation.
As it turned out, the two groups of test takers for whom we developed essentially the same kind of language test were quite different. The university group consisted of first-year students entering a university in which very little of their academic course work would involve the use of English. Most of them would be required to take at least one English course as part of their degree requirements. Though all of the students had had some exposure to English in their secondary school education, most had very little control of the language, and almost none of them had had any exposure to English outside of the EFL classroom. Few had ever spoken English with a native speaker or had had the opportunity to use English for any non-instructional purpose.
The other group consisted of university teachers from many different universities, and representing a wide range of academic disciplines, who had been selected as recipients of scholarships to continue work on advanced degrees in countries where English was the medium of instruction. They were much more highly specialized in their knowledge of their disciplines than were the first-year university students, were considerably older, on average, and were more experienced. They were also highly motivated to improve their English.
The university teachers, on the other hand, would be placed into a ten week intensive (forty hours per week) course at a national English language institute where they would be required to speak nothing but English between the hours of about eight and five every working day. They would take classes in all four skills, but would be divided into groups according to broad classifications of their academic disciplines, such as agriculture, engineering and sciences, medical sciences, and economics. Unlike the university English program, the teachers in this program were all native speakers of English, and all classroom instruction was carried out in English. This program was thus much more intensive than that of the university students: the curriculum was focused on English for specific purposes and involved a great deal more actual use of English.
未完待续
内容提要:本书系语言测评领域权威学者Bachman教授的代表作之一,系统地阐述了指导语言测试开发与使用的框架。全书分三部分,第一部分重点呈现语言测试开发与使用的理论基础;第二部分详细描述如何在语言测试开发的起始阶段构建“测试使用论证”框架;第三部分深入剖析如何在现实世界中开发和使用语言测试。 本书是近年来语言测试领域不可多得的著作,代表了语言测试的最新发展,将会对语言测试的设计、开发、使用乃至研究带来深远的影响。对语言测评领域的研究生、教师、教师培训者、研究者及考试机构从业人员都具有很高的参考价值。
【声明】感谢本书作者授权外研社刊载本文。其他任何学术平台若有转载需要,可致电010-88819585或发送邮件至research@fltrp.com,我们将帮您联系原文作者协商授权事宜,请勿擅自转载。(*封面图片来自网络)