查看原文
其他

抢鲜读 | 关于语言测评的常见误区(中)


SOME COMMON MISCONCEPTIONS AND UNREALISTIC EXPECTATIONS ABOUT LANGUAGE ASSESSMENT
We’ve found that many people who need to use language assessments in the real world have misconceptions and unrealistic expectations about what language assessments can do and what they should be like. These often prevent people from becoming competent in language assessment. Furthermore, there is often a belief that “language testers” have some almost magical procedures and formulae for creating the “best” test. These misconceptions and unrealistic expectations, and the mystique associated with language testing, constitute strong affective barriers to many people who want and need to be able to use language assessments in their professional work. Breaking down this affective barrier by dispelling and clarifying misconceptions, helping readers develop a sense of what can reasonably be expected of language assessments, and demystifying language testing is thus an important goal of this book.
Three Common Misconceptions(接上篇)This example illustrates the most common misconception that we find among those who ask for advice about their specific testing needs. Many people believe, as we did, that there is an ideal of what a “good” language test is, and they want to know how to create tests on this ideal model for their own testing needs. 
Our answer is that there is no such thing as the one “best” test, even for a specific situation, and that the terms “good” and “bad” are not very useful for describing a language test. In any situation, there will be a number of alternatives, each with advantages and disadvantages. 
To understand why this is so, we must consider some of the problems that result from this misconception. If we assume that a single “best” test exists, and we attempt either to use this test itself, or to use it as a model for developing a test of our own, we are likely to end up with a test that will be inappropriate for at least some of our test takers. In the example above, the test we developed might have been appropriate for the university students, in terms of the areas of language ability measured (grammar, vocabulary, and reading comprehension) and topical content, since this was quite general and not specific to any particular discipline. 
However, the test was probably not particularly appropriate for the university teachers, since it did not include material related to the teachers’ different disciplines or to the areas of English for Specific Purposes that were covered in the intensive course. This test was also of limited appropriateness for this group because it did not include an assessment of students’ ability to perform listening and speaking tasks, which was heavily emphasized in the intensive program.
Because of these limitations, the test for the teachers did not meet all of the needs of the test users (the Director of and teachers in the intensive program). Specifically, teachers in the intensive course reported that students who were placed into levels on the basis of the test were quite homogeneous in terms of their reading, but that there were considerable differences among students within a given level in terms of their listening and speaking. These differences made it quite difficult for teachers to find and use listening and speaking activities that were appropriate for a given group. Teachers felt that the test should be able to accurately predict students’ placement into the listening and speaking classes, as well as into the reading classes, and they urged the test developer to remedy this situation.
In an attempt to address this problem, a dictation task was added to the test. In this task, the test takers listened to a passage presented using a tape recorder, and were required to write down exactly what they heard. This particular task was added largely because it had been used previously, and was considered to be a “good” way to test listening. At the same time, the director of the intensive program agreed to group students homogeneously into listening and speaking groups on the basis of their scores on the dictation. This seemed to work well as a program modification, and teachers felt that it facilitated both their teaching and their students’ learning.
It is not clear whether it was the dictation test or the program change that solved the problem with those classes. What is clear, however, is that adding a dictation task created another problem. Most of the listening tasks in the intensive course were interactive, conversational tasks, in which responses were generally oral, and quite different from the dictation test task, which involved no interaction and required only written responses. Thus, although the addition of the dictation did, perhaps, provide some general information about the students’ ability to listen and understand spoken language, the test task itself was quite different from the kinds of listening tasks with which the students would be engaged in the intensive course, and both the test takers and the test users frequently complained about this. The final result was frustration on the part both of the teachers, whose expectations of the test tasks in terms of their use for placement and their match to the teaching activities were not met, and of the test developer, who felt he had done everything to make this the best test possible. 
(Note: A number of approaches (e.g., s/he, he/she) can be used to deal with the fact that modern English no longer has non-gender specific forms in its singular personal pronouns. The approach we will use in this book is to alternate “he” and “she”, more or less at random, but maintaining a given gender or combination of genders throughout a particular section or example.)
Table 1.1 summarizes some of the misconceptions and resulting problems that we have found to be very common among individuals who want to be able to use language tests in the real world but feel that they do not have the knowledge or competence to do so. The table also presents what we believe are some useful alternatives to these misconceptions.Misconception 1Believing that there is a single best test for any particular situation, no matter how narrowly specified, can lead to testing practice that is indefensible and to frustration and loss of confidence on the part of the test developer. 
First, this misconception may lead people who need to use language tests either to stay with their favorite, safe “tried and true” tests, or to blindly use testing methods simply because they are widely used or are popular. In either case, the tests that get used may not be appropriate for the test takers, or they may not meet the needs of the test user. This is illustrated in the example above, where we unquestioningly modeled our tests on the kinds of tests that had been developed for large-scale EFL assessments, which were intended for very different uses and for very different populations of test takers from ours. Second, unrealistically expecting to be able to find or develop a perfect test for any situation will inevitably lead to frustration on the part of the test developer, as she discovers that whatever test she develops or uses will have some strengths and some weaknesses.
Misconception 2The practice of using the same test year in and year out, simply because “it works,” or of mimicking whatever test method is currently in widespread use, provides no basis for justifying test use if and when the developer is held accountable by stakeholders, including students, teachers, and administrators.

When the test developer is placed in a situation of having to justify the indefensible, he is likely to lose confidence, as he realizes the shortcomings and inadequacies of the test he continues to use or has developed. Ultimately, the test developer may arrive at what he believes to be the answer to his problem: abandon any attempts to develop a test, and hope that a “language tester” can be found who will be able to develop the perfect test. This misconception, that language test design and development are too technical, and should be left to the experts, doesn’t really address the problem of unjustifiable testing practice, since it merely places test development on hold. 


This misconception often leads to the practice of bringing in an outside “expert,” who is likely to be unfamiliar with the situation, and expecting this person to develop a new test on her own, with little or no input from the test users. For example, it is very common for a language program or a publisher to develop a new set of course materials or textbook without giving any consideration to assessment. Then, after the course or textbook development team has been disbanded, the course director or publisher may realize that it would be useful to have some assessments that teachers can use for making decisions about achievement and progress. At that point, they typically ask a “language tester” to write, essentially from scratch, a set of classroom quizzes or achievement tests based on the content of the course materials or textbook.


Misconception 3The third misconception, believing that a test is either good or bad, depending on whether it satisfies a single quality, can lead test developers to focus on a single quality of the test, and put all their efforts into maximizing this. In the previous example, the person who writes the quizzes and tests based on the course materials or textbook is most likely to focus on the match between the content of the materials and that of the tests. While content relevance is certainly an important consideration in this case, an equally important consideration is how test takers actually perform on the tests. If, for example, students studying in a course with the new materials or textbook actually perform very poorly on the tests, it wouldn’t be clear whether this poor performance reflected inadequate learning, possibly due to ineffective teaching and materials, or whether the tests were simply too difficult for this particular group of students. In another situation, a classroom teacher might look at the technical qualities of many large-scale assessments and conclude that in order to be “good,” his classroom quiz must be highly reliable. In order to achieve this, he may believe that he must use the same kinds of test tasks that are used in large-scale assessments, ignoring the fact that these tasks might be entirely unrelated to the kinds of teaching and learning tasks he uses with his students.
In both of these examples, by focusing on a single quality, whether this be content relevance or reliability, the test developer ends up with a test that is out of balance, since it ignores other important qualities. In the first example, by ignoring how students perform on the test, the test developer risks not being able to justify her interpretations about students’ learning and achievement in the course. In the second example, the test developer risks his students’ perceiving the test tasks as not related in any way to their classroom language use.


未完待续


往期回顾:

抢鲜读 | 关于语言测评的常见误区(上)

本文节选自《语言测评实践:现实世界中的测试开发与使用论证》(Language Assessment in Practice: Developing Language Assessments and Justifying Their Use in the Real World)一书。Bachman, L. & Palmer, A. (2016). Language Assessment in Practice: Developing Language Assessments and Justifying Their Use in the Real World. Beijing: Foreign Language Teaching and Research Press.
内容提要:本书系语言测评领域权威学者Bachman教授的代表作之一,系统地阐述了指导语言测试开发与使用的框架。全书分三部分,第一部分重点呈现语言测试开发与使用的理论基础;第二部分详细描述如何在语言测试开发的起始阶段构建“测试使用论证”框架;第三部分深入剖析如何在现实世界中开发和使用语言测试。 本书是近年来语言测试领域不可多得的著作,代表了语言测试的最新发展,将会对语言测试的设计、开发、使用乃至研究带来深远的影响。对语言测评领域的研究生、教师、教师培训者、研究者及考试机构从业人员都具有很高的参考价值。


【声明】感谢本书作者授权外研社刊载本文。其他任何学术平台若有转载需要,可致电010-88819585或发送邮件至research@fltrp.com,我们将帮您联系原文作者协商授权事宜,请勿擅自转载。(*封面图片来自网络)


    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存