作为科研质量评估指标，Altmetrics 靠谱么？

2018-03-08 科研圈

至少在生物科学领域，altmetrics 的参考价值确实比较弱……

作者 Lutz Bornmann & RobinHaunschild

编译阿金

审校猫鹰谭坤

政策的制定者一直很看重科学的力量，但是，最近他们开始要求科学家和科研机构出示能证明他们科研质量的证据，这下可引发了不小的争议。长期以来，科学同行评议（peer-review）制度是验证科研论文质量和影响力的有效手段，但这个方法耗时耗力，且过程繁琐。因此，衍生出考察论文引证影响力（Citation Impact）来判断科研水平。但是，引证方法也有一定局限性，比如，是否计入负面引证？是否所有的引证都有同等的价值？此外，论文引用量需要一定时间地累积，这一原因也使得用该方法进行评估时，对年轻的科研人员与新建立的研究组织而言就不太公平了。

面对上述情况，是否存在其它可替代的有效评估手段呢？有！

替代计量指标（Alternative metrics，altmetrics）就是就被作为是一种行之有效、在传统计量指标之外的一种补充方法，涵盖了除在学术界使用的传统计量以外的其它指标，如社交媒体、博客、新闻报道，在线文献管理等等。目前，诸如 Wiley、自然杂志、F1000等很多权威期刊都在所发表的文章页面上加上了 altmetrics 的小图标。另外，科研人员也开始将这一指标添加在个人简历和科研经费申请书上。但是在影响力评估方面，altmetrics 指标的意义与价值尚不明确。有人会问，在推特上@某篇论文产生什么实际的影响力吗？如果出现高转发率的假研究，我们又该怎么办？其实，不少针对 altmetrics 的研究已经发现，引证和推文之间的关联性几乎为零，当然，也有其它研究表明，像在 Mendeley 这样的文献管理软件中标记出的文章能够表明其一定的科学影响力。为此，我们向各位介绍两项作为预印本发表在arXiv上的研究，进一步探讨 altmetrics 的潜在价值。

计量指标之间的较量

对于同一篇文章，两种评估质量的方法是否会呈现相同或迥异的结论。

在第一项研究中，针对同一篇论文，研究人员使用传统引证指标和替代计量指标（推特和在线文献管理标签）分别与专家评审结论进行比较，得出之间的关联性。收集专家评审观点的平台是F1000Prime，这个平台在论文发表后会专门提供“发表后的同行评议”，给论文评级打分。经过一番分析，研究人员发现推文与专家评审之间的关联性要弱于传统计量指标与后者之间的关联性。而在线文献管理软件中的标签计量指标倒是与传统指标倒是颇为一致。

在第二项研究中，研究人员考察了除推特以外的其它替代计量指标。结果印证了第一项研究的结论。事实上，引用计量与专家评审之间的关联性要强于替代计量指标与后者的关联性，高出约两三倍。

综上所述，至少在生物科学领域，altmetrics 的参考价值确实比较弱。

Altmetrics 还有希望么？

最近几年，科学政策往往倾向在更大范围内评估论文和科研质量，比如以整个社会为基础或非专业领域团体。Altmetrics 作为一种经济又方便的社会影响指标，仍然发挥着一定作用。对于 Altmetrics，我们希望能得到更多来自各方面的反馈，从而找到能够证明科研和论文质量的更有效的评估指标。

相关论文信息（一）

[论文题目]Do bibliometricsand altmetrics correlate with the quality of papers? A large-scale empiricalstudy based on F1000Prime, altmetrics, and citation data

[论文作者] Lutz Bornmann, Robin Haunschild

[发表期刊] arXiv.org

[发表时间] 2018年1月18日

[论文链接] https://arxiv.org/abs/1711.07291

[论文编号] arXiv:1711.07291

[论文摘要] In this study, we address the question whether (and to whatextent, respectively) altmetrics are related to the scientific quality ofpapers (as measured by peer assessments). Design: In the first step, we analysethe underlying dimensions of measurement for traditional metrics (citationcounts) and altmetrics - by using principal component analysis (PCA) and factoranalysis (FA). In the second step, we test the relationship between thedimensions and quality of papers (as measured by the post-publicationpeer-review system of F1000Prime assessments) - using regression analysis.Results: The results of the PCA and FA show that altmetrics operate alongdifferent dimensions, whereas Mendeley counts are related to citation counts,and tweets form a separate dimension. The results of the regression analysisindicate that citation-based metrics and readership counts are significantlymore related to quality, than tweets. This result on the one hand questions theuse of Twitter counts for research evaluation purposes and on the other handindicates potential use of Mendeley reader counts. Originality: Only a fewstudies have previously investigated the relationship between altmetrics andassessments by peers. The relationship is important to study: if altmetricsdata are used in research evaluation, they should be related to quality.

论文信息（二）

[论文题目]Normalizationof zero-inflated data: An empirical analysis of a new indicator family and itsuse with altmetrics data [论文作者] LutzBornmann, Robin Haunschild

[发表期刊] arXiv.org

[发表时间] 2018年1月26日

[论文链接] https://arxiv.org/abs/1712.02228

[论文编号] arXiv:1712.02228

[论文摘要] Recently, two new indicators(Equalized Mean-based Normalized Proportion Cited, EMNPC; Mean-based NormalizedProportion Cited, MNPC) were proposed which are intended for sparsescientometrics data. The indicators compare the proportion of mentioned papers(e.g. on Facebook) of a unit (e.g., a researcher or institution) with theproportion of mentioned papers in the corresponding fields and publicationyears (the expected values). In this study, we propose a third indicator(Mantel-Haenszel quotient, MHq) belonging to the same indicator family. The MHqis based on the MH analysis - an established method in statistics for thecomparison of proportions. We test (using citations and assessments by peers,i.e. F1000Prime recommendations) if the three indicators can distinguishbetween different quality levels as defined on the basis of the assessments bypeers. Thus, we test their convergent validity. We find that the indicator MHqis able to distinguish between the quality levels in most cases while MNPC andEMNPC are not. Since the MHq is shown in this study to be a valid indicator, weapply it to six types of zero-inflated altmetrics data and test whetherdifferent altmetrics sources are related to quality. The results for thevarious altmetrics demonstrate that the relationship between altmetrics(Wikipedia, Facebook, blogs, and news data) and assessments by peers is not asstrong as the relationship between citations and assessments by peers. Actually,the relationship between citations and peer assessments is about two to threetimes stronger than the association between altmetrics and assessments bypeers.

原文链接：

https://blog.f1000.com/2018/01/11/evaluating-research-different-metrics-tell-us-different-things/

阅读更多

▽ 故事

· 高压力或导致大脑萎缩，如何拯救我们的理智与自制力？

· 论文代写了解一下：要价63000美元的枪手论文是如何产生的？

· Sci-Hub背后的“盗版女王”：只要人身安全，就会坚持下去

· “2017年度学术公众号”TOP 10重磅发布 | 科研圈出品

· 中国学者成功创建新型拟人化冠心病小动物模型

▽ 论文导读

· Nature 一周论文导读 | 2018 年 2 月 22 日

· Science 一周论文导读 | 2018 年 2 月 23 日