ChatGPT在写作教学与测评中的创新应用

Original iResearch 外研社外语学术科研

2024-09-09

自动评分（automated scoring）是语言测试领域备受关注的焦点之一。与翻译自动评分相比，作文自动评分技术更为成熟（江进林 2022）。随着ChatGPT的“出圈”，也出现了将ChatGPT应用于外语写作与测评的实践与研究。Barrot（2023）指出，ChatGPT有可能帮助解决二语写作者的迫切需求，包括提供及时和适应性反馈、提供写作练习的平台、充当写作辅助工具。ChatGPT可以用作自动评分工具，根据预先设定的标准对学生的作文进行评分，并提供具体的评价建议。基于与ChatGPT的互动实践，Barrot（2023）发现该工具似乎能够考量一系列写作构念，如语用、语义、连贯、衔接、语言风格、格式、语法和句法，但在捕捉情感深度、写作声音、身份认同和修辞灵活性等人类写作特点方面似乎有点困难。

Su et al.（2023）展示了如何将ChatGPT应用于议论文写作过程教学中：在准备阶段，它可以帮助学生进行写作构思，并为学生的写作大纲提供反馈。在编辑阶段，它可以为学生的写作草稿提供反馈，以及不同的视角。该阶段聚焦内容质量，具体包括如何界定观点、组织子观点的逻辑流、从可靠来源选择证据并详细阐述证据、说明证据如何支持观点，并驳斥反对观点等内容。在校对阶段，它可以进行语言层面的改错。在反思阶段，与它的对话历史有助于促进写作者的反思。

参考文献

Barrot, J. S., 2023, Using ChatGPT for second language writing: Pitfalls and potentials, Assessing Writing 57, https://doi.org/10.1016/j.asw.2023.100745.

Su, Y., Y. Lin, C. Lai, 2023, Collaborating with ChatGPT in argumentative writing classrooms, Assessing Writing 57, https://doi.org/10.1016/j.asw.2023.100752.

江进林，2022，自动评分技术，载罗凯洲（主编），《语言测试重点问题研究》。北京：外语教学与研究出版社。132-150。

除了基本的语言校对外，ChatGPT真的强大到能有效评价文本的内容和结构吗？今天小编带大家来一探究竟！

小编选取了一篇由英语专业大二学生写的议论文作文，题目为：The downsides of smartphones。该议论文话题是：Nowadays smartphones are widely used in our society. Is this a positive or negative development?

接下来，小编采用Su et al.（2023）所用的评价标准，让ChatGPT从观点（claim）、证据（evidence）和推理与反驳（reasoning and counterargument）三个方面评价这一学生作文的内容与结构。下面展示了整个问答过程：

问：

输入学生写作文本：

The Downside of Smartphones

A recent survey suggested that there existed more than one billion cell-phones utilized by people in China, the majority being smartphones. There is no denying that smartphones lend fresh impetus to our everyday life, through which we can pay our bills anytime and everywhere to our convenience. Besides, smartphones can assist us to keep in touch with our friends regardless of time and distance. However, for my part, overindulgence in smartphones can lead to undesirable consequences, reducing our attention span and encroaching on face-to face communication.

First and foremost, convenience of smartphones has a paradoxical effect of taking away much of our attention. A case in point is that, intoxicated with many entertaining APPs, students are prone to reduced attention span, lagging behind their peers who rely less on smartphones. What is worse, more often than not, they cannot help scrolling the screen and checking messages while doing homework once the phone rings, failing to immerse themselves in their books wholeheartedly.

Secondly, it is the very smartphones that are posing threat to face-to-face communication, which is inextricably linked to social interaction. There exists a prevalent phenomenon that when a couple go out dating in a restaurant, they are more likely to stare at their phones respectively than to have a heart-to-heart talk with eye contact. It seems that instead of narrowing down the distance among people, smartphones are inclined to hinder one from interacting with others effectively and efficiently, which turns out to be nothing but detrimental to our socializing.

In all, there exists a world of difference between lives with and without smartphones. By no means can we be captivated by the comfortable and cozy life they bring us. We are supposed to heed the alarms and harness the power of smartphones in a more rational way so as not to be affected both in attention and in communication.

（上下滑动，查看学生写作文本）

OpenAI：

Claim:

(1) The writing effectively addresses the downside of smartphones, discussing the impact on attention span and face-to-face communication.

(2) The main claim is that overindulgence in smartphones can lead to undesirable consequences, specifically reducing attention span and encroaching on face-to-face communication. The writer provides examples of students being distracted while doing homework and couples focusing on their phones rather than engaging in conversation.

(3)The subclaims about reduced attention span and hindrance to face-to-face communication are logically connected and support the main claim that overindulgence in smartphones has negative consequences.

Evidence:

(4) The writing does not provide specific evidence sourced from credible and trustworthy sources. It relies more on general observations and personal experiences.

(5) The writing does not provide detailed descriptions of specific evidence since it lacks cited sources or examples.

(6) While the writing mentions observations and experiences, it does not explicitly link them to supporting the main claim.

Reasoning and counterargument:

(7) The writing presents logical inferences regarding the negative impact of smartphone use on attention span and face-to-face communication. However, it does not delve deeply into the implications or potential consequences.

(8) The writing does not include any counterarguments to address opposing viewpoints or strengthen the argument.

(9)The writing effectively uses linking words to introduce points and connect ideas in a logical sequence, aiding the reader in understanding the writer's perspective.

In summary, the student's writing addresses the downsides of smartphones, specifically focusing on attention span and face-to-face communication. The writing lacks specific evidence from credible sources and fails to link the evidence provided to support its claims. It also does not incorporate counterarguments or delve deeply into the implications. However, it effectively uses linking words to maintain a logical flow in presenting ideas. To improve, the student should consider incorporating more concrete evidence and addressing counterarguments to strengthen the overall argument.

可以看到，ChatGPT能够指出该作文的优点：作者的立场是过度使用手机有害，并提出过度使用手机会“减少注意力”“阻碍面对面交流”两个分论点，并使用具体的例子作为支撑；两个分论点能够有逻辑性地支撑作者的立场。

同时，ChatGPT也指出了作文的不足之处（见上文加粗部分）：缺少带有可靠来源的具体证据，而只是基于一般观察和个人经历，且这些并未直接和作者的主要立场相联系；缺少细节描写；在讨论分论点时，未深入探讨其影响或潜在后果；未提及反驳论点。

如果分别给这三个评价维度赋30分的话，ChatGPT还能给学生的作文进行打分：

但是，我们也能看到，ChatGPT给出的一些建议会有些模糊，它的一些评价语比较抽象，如“specific evidence”“explicitly link them to supporting the main claim”。这可能是由于ChatGPT的生成能力有限，这时候就需要学生继续追问，请ChatGPT做出进一步的解释说明（Su et al. 2023）。比如，在本例中，小编针对“it does not explicitly link them to supporting the main claim”这一评语继续追问，请它给出修改建议：

此外，Su et al.（2023）指出，除了产出信息不准确问题，在写作中使用ChatGPT也会涉及伦理道德问题：文本可能涉及作者身份和抄袭问题，且ChatGPT这一大语言模型可能会产出带有偏见、不恰当的内容。所以，写作教师应引导学生批判性地审视ChatGPT产出的内容，并在必要时提供支持（scaffolding）和补充性反馈。

那么，除此之外，还有哪些作文自动评分技术呢？小编认真阅读了《语言测试重点问题研究》的第七章，该章对此进行了系统全面的梳理。快来和小编一起学习一下吧~

（点击图片，了解图书详情）

下表列出了几个主要作文自动评分系统的特点。

第一，除了IEA和IntelliMetric，其他两个评分系统都通过变量提取、多元回归、分数计算三大步骤来完成评分。

第二，结合早期的PEG可以发现，作文自动评分系统的测量对象从语言形式发展到语义内容，再过渡到语言、内容和结构三个方面。

第三，每个评分系统都采用多种技术来提取变量。

第四，各个系统使用的变量与其测量对象对应。例如，梁茂成的iWrite系统采用流利度、地道性、复杂度方面的变量考察语言形式质量；采用语义相似度衡量语义质量；采用连接词等测量作文结构质量。

第五，与PEG相比，表7.1中的系统不仅使用相关度指标，还采用一致性指标来检验机器评分与人工评分的相似性。一致性包括绝对一致和相邻一致百分比（Chung & Baker，2003：28）。前者指机器与人工所评等级相同的文本数量占所评文本总数的比例，后者指机器与人工所评等级相差不大于1的文本数量占所评文本总数的比例。当评分量表为离散数据且等级较少时，往往使用绝对一致百分比；当评分等级较多时，也可使用相邻一致百分比（Yang et al.，2002）。

总之，现有作文自动评分系统已经比较成熟。上述系统的评分与人工评分的相关度都在0.7—0.9之间，可以在实际评分中适当代替人工评分员。

（节选自《语言测试重点问题研究》第七章“自动评分技术”，作者：江进林。）

★

想了解这些自动评分技术的运作原理等更多信息，以及更多语言测试重点问题，欢迎关注《语言测试重点问题研究》一书，并参加本书的“我来读文献”活动！

温馨提示

第111期“我来读文献”活动正在火热进行中！

第111期我来读文献 | 《语言测试重点问题研究》（领读专家：高淼副教授）

扫描下方二维码，即可加入读书活动群！

相关资讯

这个暑期，外研社精心策划了三期测评相关主题的研修班，分别涉及大学英语教学评价、教学评用一体化、以学习为导向的评价等主题。以学习为导向的评价在高校英语教学设计中的应用研修班仍在报名中，诚邀各位老师报名参加，一起交流探讨教学和测评方法，共同进步！

点击下面图片，开启研修之旅~

继续滑动看下一个

外研社外语学术科研

向上滑动看下一个

二湘：朱令去世一周年，清华学子控诉清华在朱令案中的冷血和无耻

李宜雪的良知卖了2万元，真正需要声援的是罗灿宏啊

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

洗牌电商圈！阿哲放话全网：挑战抖音所有机制！爆全品类大牌！

阿哲现身评论区，@一修！肉肉痛哭，无限期停播！回应舆论黑料，关闭私信评论区！

ChatGPT在写作教学与测评中的创新应用

您可能也对以下帖子感兴趣

二湘：朱令去世一周年，清华学子控诉清华在朱令案中的冷血和无耻

李宜雪的良知卖了2万元，真正需要声援的是罗灿宏啊

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

洗牌电商圈！阿哲放话全网：挑战抖音所有机制！爆全品类大牌！

阿哲现身评论区，@一修！肉肉痛哭，无限期停播！回应舆论黑料，关闭私信评论区！

生成图片，分享到微信朋友圈

ChatGPT在写作教学与测评中的创新应用

您可能也对以下帖子感兴趣