本文是由Eiko Fried 和 Jessica Flake收集整理的心理测量中的一些重要文献,旨在帮助研究者进行实际的测量工作,并希望能引起对心理测量进一步的讨论。最初整理这些文献是作为我们(此处指原作者)在 APS-Observer(Association for Psychology Science- Observer)上名为“Measurement Matters”这篇文章的补充材料。目前,这个清单只是初步成形,我们希望它是积极的、动态的,以后我们仍会定期更新和完善。

这个清单中的文献来自多个数据库,比如2018年的 SPSP(Society for Personality and Social Psychology) 的 SIPS(Society for the Improvement of Psychological Science) 预会议。如果你有心理测量方面的重要文献想要添加进这个清单,请通过邮箱联系我们(eikofried@gmail.com &kayflake@gmail.com)。 

这个清单不是笼统的收集、罗列与心理测量相关的所有文献,而是择取一部分关键又实用的论文分享给大家。对于这些文献中的观点,我们并非完全认同,但是我们认为这些文献和书籍能给大家提供更全面的视野。我们用*标记出了一些文献,它们尤其适用于初学者。这个清单可以在 OpenScience Framework 上获取 https://osf.io/zrkd4。


1 心理测量中的效度理论及测量原理

   1.1 心理学测量基础

   1.2 概述与总结

   1.3 关于测量和效度理论的重要讨论

2 心理测量中的重要问题

   2.1 理论和项目的发展

   2.2 形成性vs.反应性测量模型

   2.3 信度

   2.4 个体差异测量面临的挑战

3 量化方法与模型

   3.1 因素分析

   3.2 测量等价性

   3.3 项目反应理论(IRT)和项目功能差异(DIF)

   3.4 混合模型(潜在类别/剖面分析)

   3.5 网络模型

4 有关心理学测量实践的分析

5 实证研究中结构效度验证的实例

6 参考书目

7 探索新领域:研究方向和建议

8 相关中文参考文献(译者附)

1 心理测量中的效度理论及测量原理


1.1 心理学测量基础

Borsboom, D., & Mellenbergh, G. J. (2004). The Concept of Validity. Psychological Review, 111(4), 1061–1071.


Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological bulletin, 56(2), 81.

* Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281-302.

Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380.

Kane, M. T. (2013). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1–73.


Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–694.

1.2 概述与总结

* Benson, J. (1998). Developing a Strong Program of Construct Validation: A Test Anxiety Example. Educational Measurement: Issues and Practice, 17(1), 10–17.

Bollen, K. A., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110(2), 305–314.

Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, (451), 425–440. http://link.springer.com/article/10.1007/s11336-006-1447-6  

Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7(3), 309–319. http://doi.org/10.1037/1040-3590.7.3.309

* Edwards, J. R., & Bagozzi, R. P. (2000). On the Nature and Direction of Relationships Between Constructs and Measures. Psychological Methods, 5(2). http://doi.org/10.1037//1082-989X.5.2

Mcgrath, R. E. (2005). Conceptual complexity and construct validity. Journal of Personality Assessment, 85(2), 37–41. http://doi.org/10.1207/s15327752jpa8502

Slaney, K. (2017). Validating psychological constructs: Historical, philosophical, and practical dimensions. (J. Martin, Ed.). London: Palgrave Macmillian.


Strauss, M. E., & Smith, G. T. (2009). Construct validity: advances in theory and methodology. Annual Review of Clinical Psychology, 5, 1–25. http://doi.org/10.1146/annurev.clinpsy.032408.153639

1.3 关于测量和效度理论的关键的、原理性的讨论

Borsboom, D., Rhemtulla, M., Cramer, A. O. J., van der Maas, H. L. J., Scheffer, M., & Dolan, C. V. (2016). Kinds versus continua: a review of psychometric approaches to uncover the structure of psychiatric constructs. Psychological Medicine, 1–13.


Fried, E. I. (2017). What are psychological constructs? On the nature and statistical modeling of emotions, intelligence, personality traits and mental disorders. Health Psychology Review, 11(2), 130–134. http://doi.org/10.1080/17437199.2017.1306718

Kendler, K. S. (2016). The nature of psychiatric disorders. World Psychiatry, 15(1), 5–12. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/26833596

Maul, A. (2017). Rethinking Traditional Methods of Survey Validation. Measurement, 15(2), 51–69. http://doi.org/10.1080/15366367.2017.1348108

* Meehl, P. E. (1978). Theoretical risks and tabular asterisks: The slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834.


Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88(3), 355–383. 


2 心理测量中的重要问题


2.1 理论和项目的发展

Dawis, R. V. (1987). Scale construction. Journal of Counseling Psychology, 34(4), 481.

* Gehlbach, H., & Brinkworth, M. E. (2011). Measure twice, cut down error: A process for enhancing the validity of survey scales. Review of General Psychology, 15(4), 380–387. http://doi.org/10.1037/a0025704

Simms, L. J. (2008). Classical and Modern Methods of Psychological Scale Construction. Social and Personality Psychology Compass, 2(1), 414–433.


Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of applied psychology, 47(2), 149.

2.2  测量:形成性vs.反应性测量模型

Bollen, K. A., Diamantopoulos, A., & Bollen, K. A. (2015). In Defense of Causal – Formative Indicators : A Minority Report.Psychological Methods

Edwards, J. R. (2011). The Fallacy of Formative Measurement. Organizational Research Methods, 14(2), 370–388. http://doi.org/10.1177/1094428110378369

MacKenzie, S. B., Podsakoff, P. M., & Jarvis, C. B. (2005). The Problem of Measurement Model Misspecification in Behavioral and Organizational Research and Some Recommended Solutions. The Journal of Applied Psychology, 90(4), 710–730. http://doi.org/10.1037/0021-9010.90.4.710

Rhemtulla, M., van Bork, R., & Borsboom, D. (2015). Calling Models With Causal Indicators “Measurement Models” Implies More Than They Can Deliver. Measurement: Interdisciplinary Research and Perspectives, 13(1), 59–62. http://doi.org/10.1080/15366367.2015.1016343

2.3  信度

Cortina, J. M. (1993). What Is coefficient Alpha ? An examination of theory and applications. Journal of Applied Psychology, 78(1), 98–104.

McNeish, D. (2017). Thanks Coefficient Alpha, We'll Take It From Here. Psychological Methods.

Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma. Psychometrika, 74(1), 145.

Schmitt, N. (1996). Uses and abuses of coefficient Alpha. Psychological Assessment, 8(4), 350–353.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107.

2.4 个体差异测量面临的挑战

Cooper, S. R., Gonthier, C., Barch, D. M., & Braver, T. S. (2017). The role of psychometrics in individual differences research in cognition: A case study of the AX-CPT. Frontiers in Psychology, 8(SEP), 1–16. 


Fröhner, J., Teckentrup, V., Smolka, M., & Kroemer, N. (2018). Addressing the reliability fallacy: Similar group effects may arise from unreliable individual effects. Preprint, 1–29. http://doi.org/10.1101/215053

Hedge, C., Powell, G., & Sumner, P. (2017). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 1–21. http://doi.org/10.3758/s13428-017-0935-1

3 量化方法与模型


3.1 因素分析

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological methods, 4(3), 272. doi:10.1037/1082-989X.4.3.272

Flora, D. B., & Flake, J. K. (2017). The purpose and practice of exploratory and confirmatory factor analysis in psychological research: Decisions for scale development and validation. Canadian Journal of Behavioural Science/Revue canadienne des sciences du comportement, 49(2), 78. http://dx.doi.org/10.1037/cbs0000069

Meehl, P. E. (1993). Four Queries About Factor Reality. History and Philosophy of Psychology Bulletin, 5(2), 4–5.

3.2 测量等价性

Borsboom, D. (2006). When does measurement invariance matter? Medical Care, 44(11).

Byrne, B. M., Shavelson, R. J., & Muthén, B. O. (1989). Testing for the equivalence of factor covariance and mean structures : The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456–466.

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233-255.

Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7(4), 361-388.

Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543.

Van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9(4), 486-492.

3.3 项目反应理论(IRT)和项目功能差异(DIF)

* Hambleton, R.K & Russell, W.J. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice [Module16:https://www.ncme.org/ncme/NCME/NCME/Publication/ITEMS.aspx]

Harris, D. (1989). Comparison of 1-, 2-, and 3-Parameter IRT models. Educational Measurement: Issues and Practice. 


Penfield, R.D. (2014) An NCME instructional module on polytomous item response theory models. Educational Measurement: Issues and Practice. 33(1), 36-48. 


Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational measurement, 27(4), 361-370.

Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models.

3.4 混合模型 (潜在类别/剖面分析)

Finch, W.H. & Bronk, K.C. (2011). Conducting confirmatory latent class analysis using Mplus. Structural Equation Modeling, 18, 132-151.

Masyn, K. E. (2013). Latent Class Analysis and Finite Mixture Modeling. In P. Nathan and T. Little (Eds.), The Oxford Handbook of Quantitative Methods (pp. 551-611). New York, NY. Oxford University Press.

3.5 网络模型

Epskamp, S., & Fried, E. I. (2018). A Tutorial on Regularized Partial Correlation Networks. Psychological Methods. http://doi.org/10.1037/met0000167

Foygel, R., & Drton, M. (2010). Extended Bayesian InformationCriteria for Gaussian Graphical Models. In Advances in Neural Information Processing Systems 23 (pp. 604–612).

van der Maas, H. L. J., Dolan, C. V, Grasman, R. P. P. P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, M. E. J. (2006). A dynamical model of general intelligence: the positive manifold of intelligence by mutualism. Psychological Review, 113(4), 842–61. http://doi.org/10.1037/0033-295X.113.4.842

4 有关心理学测量实践的分析


Barry, A. E., Chaney, B., Piazza-Gardner, A. K., & Chavarria, E. A. (2014). Validity and Reliability Reporting Practices in the Field of Health Education and Behavior. Health Education & Behavior, 41(1), 12–18. http://doi.org/10.1177/1090198113483139

Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychology and Personality Science, 1–9. http://doi.org/10.1177/1948550617693063

Fried, E. I. (2017). The 52 symptoms of major depression. Journal of Affective Disorders, 208, 191–197. http://doi.org/10.1016/j.jad.2016.10.019

Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Annual review of clinical psychology, 5, 27-48.

Rodebaugh, T. L., Scullin, R. B., Langer, J. K., Dixon, D. J., Huppert, J., Bernstein, A., … Lenze, E. (2016). Unreliability as a Threat to Understanding Psychopathology: The Cautionary Tale of Attentional Bias. Journal of Abnormal Psychology, 125(6), 840–851. http://doi.org/10.1037/abn0000184

Santor, D. A., Gregus, M., & Welch, A. (2006). Eight Decades of Measurement in Depression. Measurement, 4(3), 135–155.

Weidman, A. C., Steckler, C. M., & Tracy, J. L. (2017). The jingle and jangle of emotion assessment: Imprecise measurement, casual scale usage, and conceptual fuzziness in emotion research. Emotion, 17(2), 267.

5 实证研究中结构效度验证的实例

Flake, J. K., Barron, K. E., Hulleman, C., Mccoach, D. B., & Welsh, M. E. (2015). Measuring cost: The forgotten component of expectancy-value theory. Contemporary Educational Psychology, 41, 232–244. http://doi.org/10.1016/j.cedpsych.2015.03.002

Hulleman, C. S., Schrager, S. M., Bodmann, S. M., & Harackiewicz, J. M. (2010). A meta-analytic review of achievement goal measures: Different labels for the same constructs or different constructs with similar labels? Psychological bulletin, 136(3), 422.

McCoach, D. B., & Siegle, D. (2003). The school attitude assessment survey-revised: A new instrument to identify academically able students who underachieve. Educational and Psychological Measurement, 63(3), 414-429.

Miller, F. G., Johnson, A. H., Yu, H., Chafouleas, S. M., McCoach, D. B., Riley-Tillman, T. C., ... & Welsh, M. E. (2018). Methods matter: A multi-trait multi-method analysis of student behavior. Journal of School Psychology, 68, 53-72.

Pastor, D. A., Barron, K. E., Miller, B. J., & Davis, S. L. (2007). A latent profile analysis of college students’ achievement goal orientation. Contemporary Educational Psychology, 32(1), 8-47.

6 参考书目

Borsboom, D. (2005). Measuring the Mind: Conceptual Issues in Contemporary Psychometrics. Cambridge University Press. http://doi.org/https://doi.org/10.1017/CBO9780511490026

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston, 6277 Sea Harbor Drive, Orlando, FL 32887.

Embretson, S., & Reise, S. (2013). Item Response Theory. Psychology Press.

Markus, K. A., & Borsboom, D. (2013). Frontiers of test validity theory: Measurement, causation, and meaning. Routledge.

McCoach, D. B., Gable, R. K., & Madura, J. P. (2013). Instrument development in the affective domain. New York, NY: Springer.

Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory. Routledge.

Slaney, K. (2017). Validating psychological constructs: Historical, philosophical, and practical dimensions. Springer.

Zumbo, B. D., & Hubley, A. M. (Eds.). (2017). Understanding and investigating response processes in validation research (Vol. 26). New York, NY: Springer.

7 探索新领域: 研究方向 & 建议

-自陈法 vs. 临床报告法

-Schwarz, N. (1999). Self-reports: how the questions shape the answers. American Psychologist, 54, 93–105.

-访谈法 vs.结构化访谈

-追述报告 &回忆

-密集日记法 & 经验取样研究中存在的测量问题


8 相关中文文献(译者附)


周浩;龙立荣. (2004). 共同方法偏差的统计检验与控制方法. 心理科学进展, 12, 942-950.



