By Cathy O'Neil, Rachel Schutt
Publisher: O'Reilly Media
Release Date: October 2013
Pages: 408
Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.
In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.
Topics include:
Statistical inference, exploratory data analysis, and the data science process
Algorithms
Spam filters, Naive Bayes, and data wrangling
Logistic regression
Financial modeling
Recommendation engines and causality
Data visualization
Social networks and data journalism
Data engineering, MapReduce, Pregel, and Hadoop
Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.
by Rob J Hyndman and George Athanasopoulos
Publisher: OTexts
Release Date: 13 October 2013
Forecasting is required in many situations. Deciding whether to build another power generation plant in the next five years requires forecasts of future demand. Scheduling staff in a call centre next week requires forecasts of call volumes. Stocking an inventory requires forecasts of stock requirements. Telecommunication routing requires traffic forecasts a few minutes ahead. Whatever the circumstances or time horizons involved, forecasting is an important aid in effective and efficient planning. This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly. Examples use R with many data sets taken from the authors' own consulting experience.
by Terence Parr and Jeremy Howard (2018-2019 )
This book is a primer on machine learning for programmers trying to get up to speed quickly. You'll learn how machine learning works and how to apply it in practice. We focus on just a few powerful models (algorithms) that are extremely effective on real problems, rather than presenting a broad survey of machine learning algorithms as many books do. Co-author Jeremy used these few models to become the #1 competitor for two consecutive years at Kaggle.com. This narrow approach leaves lots of room to cover the models, training, and testing in detail, with intuitive descriptions and full code implementations.
This book is available online.
Machine learning has great potential for improving products, processes and research. But computers usually do not explain their predictions which is a barrier to the adoption of machine learning. This book is about making machine learning models and their decisions interpretable.
After exploring the concepts of interpretability, you will learn about simple, interpretable models such as decision trees, decision rules and linear regression. Later chapters focus on general model-agnostic methods for interpreting black box models like feature importance and accumulated local effects and explaining individual predictions with Shapley values and LIME.
All interpretation methods are explained in depth and discussed critically. How do they work under the hood? What are their strengths and weaknesses? How can their outputs be interpreted? This book will enable you to select and correctly apply the interpretation method that is most suitable for your machine learning project.
https://jakevdp.github.io/PythonDataScienceHandbook/index.html
Includes functional programming
Introduction: http://www.r-tutor.com/r-introduction
Elementary statistics: http://www.r-tutor.com/elementary-statistics
https://bookdown.org/ndphillips/YaRrr/
The purpose of this book is to help you learn R from the ground-up.
http://www.r-tutor.com/content/r-tutorial-ebook
The purpose of this book is to provide practical data science skills to managers and business analysts.
https://bookdown.org/hbsabafaculty/ids_book/
A complete course on bayesian statistics.
https://xcelab.net/rm/statistical-rethinking/
https://bookdown.org/content/4857/