GitHub - wanghao1991217/ElementsOfDataScience: An introduction to data science in Python, for people with no programming experience.

wanghao1991217 / ElementsOfDataScience Public

forked from AllenDowney/ElementsOfDataScience

Notifications You must be signed in to change notification settings
Fork 0
Star 0

An introduction to data science in Python, for people with no programming experience.

allendowney.github.io/ElementsOfDataScience/

0 stars 138 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
figs		figs
.gitignore		.gitignore
01_variables.ipynb		01_variables.ipynb
02_times.ipynb		02_times.ipynb
03_arrays.ipynb		03_arrays.ipynb
04_loops.ipynb		04_loops.ipynb
05_dictionaries.ipynb		05_dictionaries.ipynb
06_plotting.ipynb		06_plotting.ipynb
07_dataframes.ipynb		07_dataframes.ipynb
08_distributions.ipynb		08_distributions.ipynb
09_relationships.ipynb		09_relationships.ipynb
10_regression.ipynb		10_regression.ipynb
11_inference.ipynb		11_inference.ipynb
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
brfss.hdf5		brfss.hdf5
brfss_clean.ipynb		brfss_clean.ipynb
correlation.ipynb		correlation.ipynb
environment.yml		environment.yml
geo_example.ipynb		geo_example.ipynb
gss.hdf5		gss.hdf5
machine_bias_table.png		machine_bias_table.png
mexico.ipynb		mexico.ipynb
nsfg.hdf5		nsfg.hdf5
nsfg_clean.ipynb		nsfg_clean.ipynb
pew_religion_data.ipynb		pew_religion_data.ipynb
pew_religion_table1.csv		pew_religion_table1.csv
pew_religion_tables_2019.pdf		pew_religion_tables_2019.pdf
run_on_colab_small.png		run_on_colab_small.png
testing_means.ipynb		testing_means.ipynb
utils.py		utils.py

Repository files navigation

Elements of Data Science is an introduction to data science in Python for people with no programming experience. My goal is to present a small, powerful subset of Python that allows you to do real work in data science as quickly as possible.

At the same time, I want to make sure the material is presented clearly. I don't assume that the reader knows anything about programming, statistics, or data science. When I use a term, I try to define it immediately, and when I use a programming feature, I try to explain it.

There are a few places where I use a programming feature before it is fully explained, but I keep them to a minimum, and I'll let you know what you don't need to know.

This "book" is in the form of Jupyter notebooks. Jupyter is a software development tool you can run in a web browser, so you don't have to install any software. A Jupyter notebook is a document that contains text, Python code, and results. So you can read it like a book, but you can also modify the code, run it, develop new programs, and test them.

The notebooks contains exercises where you can practice what you learn. Most of the exercises are meant to be quick, but a few are more substantial.

This material is a work in progress, so suggestions are welcome. The best way to provide feedback is to click here and create an issue in this GitHub repository.

The notebooks

For each of the notebooks below, you have two options: if you view the notebook on NBViewer, you can read it, but you can't run the code. If you run the notebook on Colab, you'll be able to run the code, do the exercises, and save your modified version of the notebook in a Google Drive (if you have one).

Notebook 1

Variables and values: The first notebook explains how to use Jupyter and introduces the most basic programming features in Python, variables and values.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 2

Times and places: This notebook shows how to represent times, dates, and locations in Python, and uses the GeoPandas library to plot points on a map.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 3

Lists and Arrays: This notebook presents lists and NumPy arrays. It discusses absolute, relative, and percent errors, and ways to summarize them.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 4

Loops and Files: This notebook presents the for loop and the if statement; then it uses them to speed-read War and Peace and count the words.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 5

Dictionaries: This notebook presents one of the most powerful features of Python, dictionaries, and uses them to count the unique words in War and Peace.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 6

Plotting: This notebook introduces Matplotlib, a plotting library for Python, and uses it to generate a few common data visualizations and one less common one, a Zipf plot.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 7

DataFrames: This notebook presents DataFrames, which are used to represent tables of data. And it uses data from the National Survey of Family Growth to find the average weight of babies in the U.S.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 8

Distributions: This notebook explains what a distribution is and presents 3 ways to represent a distribution: a PMF, CDF, or PDF. It also shows how to compare a distribution to another distribution or a mathematical model.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 9

Relationships: This notebook explores relationships between variables using scatter plots, violin plots, and box plots. It quantifies the strength of a relationship using the correlation coefficient and uses simple regression to estimate the slope of a line.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 10

Regression: This notebook presents multiple regression and uses it to explore the relationship between age, eduction, and income. It uses visualization to interpret multivariate models. It also presents binary variables and logistic regression.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

Notebook 11

Inference: This notebook presents computational inference, a process for computing p-values, standard errors, and confidence intervals using randomization methods rather than analysis.

Press this button to run this notebook on Colab:

or click here to read it on NBViewer

About

An introduction to data science in Python, for people with no programming experience.

allendowney.github.io/ElementsOfDataScience/

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 99.0%
Python 1.0%