查看原文
其他

CityReads│What Is the Best Way to Learn Statistics?

Dan Kopf 城读 2020-09-12

175

What Is the Best Way to Learn Statistics?


Start learning statistics with this book.

Dan Kopf, 2018. These are the best books for learning modern statistics—and they’re all free


Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, 2017. An Introduction to Statistical Learning with Applications in R, corrected 7th printing. Springer.

Trevor Hastie, Robert Tibshirani, Jerome Friedman,2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, Springer.

 

Sources: https://qz.com/1206229/this-is-the-best-book-for-learning-modern-statistics-its-free/

http://www-bcf.usc.edu/~gareth/ISL/ 

https://web.stanford.edu/~hastie/ElemStatLearn/ 


Statistics came well before computers. It would be very different if it were the other way around.

 

The stats most people learn in high school or college come from the time when computations were done with pen and paper. “Statistics were constrained by the computational technology available at the time,” says Stanford statistics professor Robert Tibshirani. “People use certain methods because that is how it all started and that’s what they are used to. It’s hard to change it.”

 

People who have taken intro statistics courses might recognize terms like “normal distribution,” “t-distribution,” and “least squares regression.” We learn about them, in large part, because these were convenient things to calculate with the tools available in the early 20th century. We shouldn’t be learning this stuff anymore—or, at least, it shouldn’t be the first thing we learn. There are better options.

 

As a former data scientist, there is no question I get asked more than, “What is the best way to learn statistics?” I always give the same answer: Read An Introduction to Statistical Learning. Then, if you finish that and want more, read The Elements of Statistical Learning. These two books, written by statistics professors at Stanford University, the University of Washington, and the University Southern California, are the most intuitive and relevant books I’ve found on how to do statistics with modern technology. You can download them for free.

 


An Introduction to Statistical Learning provides(http://www-bcf.usc.edu/~gareth/ISL/) an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented.

 

Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.


 

Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009)(https://web.stanford.edu/~hastie/ElemStatLearn/) , a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.

 

Number crunchers

 

The books are based on the concept of “statistical learning,” a mashup of stats and machine learning. The field of machine learning is all about feeding huge amounts of data into algorithms to make accurate predictions. Statistics is concerned with predictions as well, but also with determining how confident we can be about the importance of certain inputs.

 

Statistical learning is meant to take the best ideas from machine learning and computer science, and explain how they can be used and interpreted through a statistician’s lens.

 

The beauty of these books is that they make seemingly impenetrable concepts—“cross-validation,” “logistical regression,” “support vector machines”—easily understandable. This is because the authors focus on intuition rather than mathematics.

 

Unlike many statisticians, Tibshirani and his coauthors don’t come from a math background. He believes this helps them think conceptually. “We try to explain [concepts] intuitively by explaining the underlying idea first,” he says. “Then we give examples of a situation you would expect it work. And also, a situation where it might not work. I think people really appreciate that.” I certainly did.

 

For example, a section of An Introduction to Statistical Learning is dedicated to explaining the use of “bootstrapping”—a statistical technique only available in the age of computers. Bootstrapping is a way to assess the accuracy of an estimate by generating multiple datasets from the same data. For example, lets say you collected the weights of 1,000 randomly selected adult women in the US, and found that the average was 130 pounds. How confident can you be in this number? In conventional statistics, to answer this question you would use a formula developed more than a century ago, which relies on many assumptions. Today, rather than make those assumptions, you can use a computer to take thousands of samples of 500 people from your original 1,000 (this is the bootstrapping) and see how many of these results are close to 130. If most of them are, you can be more confident in the estimate.

 

Theory and application

 

These books, mercifully, don’t require high-level math, like multivariate calculus or linear algebra. While knowledge of those topics is very valuable, we believe that they are not required in order to develop a solid conceptual understanding of how statistical learning methods work, and how they should be applied,” says Daniela Witten, a coauthor of An Introduction to Statistical Learning.

 

Helpfully, the books also provide code you can use to apply the tools with the statistical programming language R. I recommend putting their examples to work on a dataset you are excited about. If you are into novels, use it to analyze Goodreads ratings. If you like basketball, apply their examples to numbers at Basketball Reference. The statistical learning tools are wonderful in themselves, but I’ve found they work best for people who are motivated by a personal or professional project.

 

Data and statistics are an increasingly important part of modern life, and nearly everyone would be better off with a deeper understanding of the tools that help explain our world. Even if you don’t want to become a data analyst—which happens to be one of the fastest-growing jobs out there, just so you know—these books are invaluable guides to help explain what’s going on.


Related CityReads

06.CityReads│Life in the City Is Essentially One Giant Math Problem

11.CityReads│Why So Many Emerging Megacities Remain So Poor?

12.CityReads│How economists study cities?

23.CityReadsHow to Tell Lies with Maps?

35.CityReads│The Joy of Stats

44.CityReads│How Could Humanity Escape Poverty?

49.CityReads│1800: A Year of Significance

82.CityReads│The End of Growth in the Standard of Living?

91.CityReads│Income inequality in Latin America in the 2010s

105.CityReads│Winners and Losers of Globalization

117.CityReads│Remembering Edutainer Hans Rosling,Who Made Data Dance and Taught us Fact-based Worldview

127.CityReads│Everybody Lies: How the Internet Reveals Who We Are

144.CityReads│Everyone Can Excel at Math & Science, If You Learn How to Learn

145.CityReads│Can the Food Production Keep Pace with the Population Growth?

148.CityReads│A New Way of Learning Economics to Understand Real World

159.CityReads│Chilren in China: Evidences from 2015 Mini-Census

165.CityReads│Scale: Simple Law of organisms, Cities and Companies

170.CityReads│Why GDP Is Not Enough to Measure Development

171.CityReads│Free Online Course CitiesX Teaches Everything about Urban Life

(Click the title or enter our WeChat menu and reply number 

CityReads Notes On Cities

"CityReads", a subscription account on WeChat, 

posts our notes on city reads weekly. 

Please follow us by searching "CityReads"  

Or long press the QR code  above



    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存