查看原文
其他

CityReads│How To Improve Your Data Literacy?

Spiegelhalter 城读 2022-07-13


252


How To Improve Your Data Literacy?

 

David Spiegelhalter teaches you not only how to use statistical analysis to solve real-world problems, but also to understand and critique any conclusions drawn by others on the basis of statistics.

David Spiegelhalter, 2019. The Art of Statistics: Learning from Data. Penguin.
Lleo, S. (2019). The Art of Statistics:Learning from Data. Quantitative Finance, 19(8), 1267–1268.doi:10.1080/14697688.2019.1626475
 
Sources: https://www.penguin.co.uk/authors/126755/david-spiegelhalter.html
https://onlinelibrary.wiley.com/doi/full/10.1111/test.12206

Literacy, the ability to read and write, is without question an essential skill to understand the world. Data literacy is equally crucial, advocates David Spiegelhalter, in his latest book The Art of Statistics: Learning from Data.

Sir David Spiegelhalter is former president of the UK’s Royal Statistical Society and current Winton professor of the public understanding of risk in the Statistical Laboratory at the University of Cambridge. David Spiegelhalter is probably the most prominent statistician in the UK — in particular he is the go to voice the media turn to when some new extreme claim is made about health risks. And it's with good reason — he can get to the heart of the matter and make things intelligible to the lay person without dumbing down or veering into technicalities that obscure rather than elucidate. Financial Times considers Spiegelhalteras the closest living equivalent of the late Hans Rosling,(CityReads│The Joy of StatsCityReads|Remembering Edutainer Hans Rosling,Who Made Data DanceCityReads│Ten Rules of Factful Thinking to Learn about the Worlda Swedish professoron public health who made statistics lively and easily accessible to the public. Spiegelhalter defines data literacy as the ‘ability to carry out statistical analysis on real-world problems and understand and critique any conclusion drawn by others based on statistics.’
 
Statistics can be a double-edged sword. On the one hand, they have all the guise of a hard objective fact. They can often clinch an argument by providing evidence to substantiate a claim. And yet, we are all too aware of how they can be weaponized in order to manipulate people. And, more benignly, they can just be incredibly confusing. With all this in mind, developing our statistical literacy seems more urgent than ever.
 
The Art of Statistics: Learning from Data is intended for both students of statistics who are seeking a non-technical introduction to the basic issues, and general readers who want to be more informed about the statistics they encounter both in their work and in everyday life,guiding the reader through the essential principles we need in order to derive knowledge from data. Preferring concrete real-life examples to dense mathematical formulae, Spiegelhalter introduces the key skills of data analysis, including how to spot questionable methodologies, in an accessible and enjoyable way. The book focuses on the art of statistical problem-solving, rather than on a technical discussion of the statistical ‘bag of tools.’
 
Throughout the book, Spiegelhalter emphasizes the importance of a data-driven, problem-oriented “PPDAC” structure: Problem-Plan-Data-Analysis-Conclusion. He describes how statisticians approach each section of an investigation, and the tools that come into play. PPDAC starts with defining a problem or question and developing a plan for what to measure, how to measure it and what analyses will serve best. Then researchers collect data, analyze them according to the plan and decide what conclusions reasonably follow. Spiegelhalter also insists on the importance of replicability. To encourage this practice, the author gives access to the data and R code for all the examples discussed in the book.
  

 
The book is organized into an introduction (arguing why we need statistics and how we should teach it) and 14 chapters, with notes, references, and further reading for each chapter collected at the end of the book. In addition, a useful summary of the key points is presented at the end of each chapter, and there are 24 pages comprising an excellent glossary of terms. The last chapteris short and lists ten simple rules for effective statistical practice.
 
In the introductory section, Spiegelhalter discusses how data and statistics helped to detect the UK's most prolific serial killer—Harold Shipman.
 
Harold Shipman was Britain’s most prolific convicted murderer, though he does not fit the archetypal profile of a serial killer. A mild-mannered family doctor working in a suburb of Manchester, between 1975 and 1998 he injected at least 215 of his mostly elderly patients with a massive opiate overdose. He finally made the mistake of forging the will of one of his victims so as to leave himsome money: her daughter was a solicitor, suspicions were aroused, and forensic analysis of his computer showed he had been retrospectively changing patient recordsto make his victims appear sicker than they really were.
 
Spiegelhalter was one of a number of statisticians called to give evidence at the publicinquiry, which concluded that he had definitely murdered 215 of his patients, and possibly 45 more.
 
 

Figure0.1 is a fairly sophisticated visualization of this data, showing ascatter-plot of the age of victim against their date of death, with the shading of the points indicating whether the victim was male or female.
 
Shipman’s victims were mainly women. The bar-chart on the right of the picture shows that most of his victims were in their 70s and 80s, but looking at the scatter of points reveals that although initially they were all elderly, some younger cases crept in as the years went by. The bar-chart at the top clearly shows a gap around 1992 when there were no murders. It turned out that before that time Shipman had been working in a joint practice with other doctors but then, possibly as he felt under suspicion, he left to form a single-handed general practice.
 


Figure0.2 is a line graph comparing the times of day that Shipman’s patients died to the times that a sample of patients of other local family doctors died. The pattern does not require subtle analysis: the conclusion is sometimes known as‘inter-ocular’, since it hits you between the eyes. Shipman’s patients tended overwhelmingly to die in the early afternoon.
 
The Shipman story amply demonstrates the great potential of using data to help us understand the world and make better judgements. This is what statistical science is all about.
 
It also gives a foretaste of the real‐world, interesting and practical problems and examples that permeate the book. Nearly 50 real‐world problems/questions are positioned at key places throughout the chapters. Some are important scientific hypotheses, such as whether the Higgs boson exists, or if there really is convincing evidence for extra-sensory perception (ESP). Others are questions about health care, such as whether busier hospitals have higher survival rates, and if screening for ovarian cancer is beneficial. Sometimes we just want to estimate quantities, such as the cancer risk from bacon sandwiches, the number of sexual partners people in Britain have in their lifetime, and the benefit of taking a daily statin. And some questions are just interesting, such as identifying the luckiest survivor from the Titanic; whether Harold Shipman could have been caught earlier; and assessing the probability that a skeleton found in a Leicester car park really was that of Richard III.
 
The beauty of the book lies in the way the author answers these questions and solves the problems the questions raise. He does this with clarity, honesty, and using easy‐to‐understand language. Here are some examples of questions headdresses.
 
  • What kind of people did Harold Shipman murder, and when did they die?
  • How many trees are there on the planet?
  • What happened to children having heart surgery in Bristol between 1984 and 1995?
  • What's the cancer risk from bacon sandwiches?
  • Can we trust the wisdom of crowds?
  • How many sexual partners have people in Britain really had?
  • Does going to university increase the risk of getting a brain tumour?
  • Using their parents’ heights, how can we predict an adult offspring’s height?
  • How many people are unemployed in the UK?
  • Are more boys born than girls?
  • Does extra-sensory perception (ESP) exist?
  • Do busier hospitals have higher survival rates?
  • What is the benefit of screening for ovarian cancer?
 
The first six chapters take the reader on a grand tour of statistical analysis and modeling. Chapters 1–5 respectively introduce the nature and type of data, define the summary measures, discuss the use of visualization to communicate data, explore causality, and explain linear regression. Chapter 6 explores algorithms, analytics, and prediction. This chapter simultaneously showcases the current frontier of statistical problem solving, and points to the limitations of a purely algorithmic approach.
 
The second half of the book encourages readers to dive into more profound and intricate questions. Chapters 7–10 respectively introduce estimates and intervals,explore probability, explain hypothesis testing, and examine the use and misuse of P-values and confidence intervals. Chapter 10 concludes with the difficulties created by an overreliance on P-values and an examination of the American Statistical Association’s six principles regarding P-values. Chapter11 introduces the Bayesian method: Bayes theorem, the likelihood ratio, Bayesian hypothesis testing, and hierarchical models used for election polling. Chapter 12 discusses pitfalls in statistical work, questionable research practices, and outright fraud. Finally, Chapters 13 and 14 suggest possible improvements to perform and communicate statistical analyses better, and checklists to assess statistical claims.
 
The Art of Statistics is an engaging and enjoyable read. Technical ideas are explained using a mix of simple tools and concrete examples. For example, cointosses and expected frequency trees introduce the notion of probability, and simulations and resembling provide an intuition for the Law of Large Numbers, the Central Limit Theorem, and statistical inference.
 
The Art of Statistics is well-written in a conversational style which retains clarity and precision. For example, the discussion and definition of the P-value, which is a usual pitfall for most statistics textbooks, is a special treat. The author not only provides a straightforward and intuitive definition—‘a P-value is the probability of getting a result as extreme as we did, if the null hypothesis (and all other modeling assumptions) were really true’—but he also explains plainly the risks and misuse of P-values, and gives a historical account of the P-value vs. confidence interval controversy. The Art of Statistics representsstatistics communication at its very best.
 
The message of The Art of Statistics is encouraging: statistics is adifficult subject, but it is also a fascinating subject that has become increasingly necessary to understand the world around us.

 

Wish you all a happy Mid-Autumn Festival! 



Related CityReads

6.CityReads│Life in the City Is Essentially One Giant Math Problem

23.CityReads│How to Lie With Maps

35. CityReads│The Joy of Stats,

117.CityReads|Remembering Edutainer Hans Rosling,Who Made Data Dance

127.CityReads│Everybody Lies: How the Internet Reveals Who We Are

144.CityReads│Everyone Can Excel at Math & Science

148.CityReads│A New Way of Learning Economics to Understand World

165.CityReads│Scale: Simple Law of organisms, Cities and Companies

169.CityReads│Dollar Street shows how people live by photos.

170.CityReads│Why GDP Is Not Enough to Measure Development

175.CityReads│What Is the Best Way to Learn Statistics?

204.CityReads│All You Need to Know About the Global Inequality

211.CityReads│Learning Statistical Thinking for the 21st Century

213.CityReads│When Words Meet Numbers: What It Reveals about Writing

235.CityReads│How to Spot Chart Lies?

236.CityReads│Using Big Data to Solve Economic and Social Problems

237.CityReads│Ten Rules of Factful Thinking to Learn about the World

(Click the title or enter our WeChat menu and reply number 

CityReads Notes On Cities

"CityReads", a subscription account on WeChat, 

posts our notes on city reads weekly. 

Please follow us by searching "CityReads"

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存