A semi-serious collection of notes on Data Science, from Statistics to Machine Learning, passing through all sorts of other related things. Not necessarily in the order you'd expect.
What exactly is this? Why did I even bother doing this? How did I choose to do this?
- The about gives a (sorta) detailed explanation from conception till realisation, and the structure and choices;
- The adminy documents the folders which are supporting material;
- The resources file lists some data science material which I find brilliant and goes cross-topics, note that references to the single topics are listed in the respective notebooks instead.
Following here is the table of contents with links pointing to the chapters, where you can then proceed to the single topics.
Statistics is at the core of data science, and thourough data analysis is typically the starting point of any modelling effort on data.
- General Intro
- Foundational concepts on distributions and measures
- Hypothesis testing
- Methods, theorems and laws
- Notable brain teasers and paradoxes, and how to be careful with data
What is Machine Learning, how you do it and its building blocks. This chapter is strictly linked to the following one on algorithms.
- Overview of the field
- Learning algorithms
- Modelling techniques
- Dimensionality reduction and matrix factorisation
Discussing algorithms in Machine Learning, one by one.
- Supervised Learning
- Unsupervised Learning
How do we measure how good a model we built is?
- Generic problems models can have
- Performance Metrics and validation techniques
- Diagnostics
Artificial neural networlks, whether "shallow" or "deep", deserve their own chapter.
- What are they and how they work, in general
- Types of artificial neurons and networks
NLP is the field (a part of Machine Learning) which deals with text, an unstructured data source. What NLP tries to do is putting text into numerical representations, and extracting information from it, in an attempt to do understand (?) it.
- General concepts and tasks in NLP
- Manipulating text and extracting information
- Topic Modelling
- Word Embeddings
Can a machine view? Well, an image is a matrix so it's all algebraic operations!
- Intro and quantifying images
- Processing an image
- What's in an image
Stuff about what's a machine and how to talk to it in the first place.
- Data structures and foundational algorithms
- Foundations of programming
Collecting some reference mathematical results here, for reference.
An overview in a rush of some of the usual suspects used in data science, from programming languages to frameworks and various tools.
- The Python data stack
- Databases and distributed frameworks
(C) 2017-2018 Martina Pugliese
This work is released under the MIT licence, full info here.