Skip to content

A personally elaborated collection of data science notes written entirely in Jupyter notebooks.

License

Notifications You must be signed in to change notification settings

Veereshdammur/tales-science-data

 
 

Repository files navigation

Tales of Science & Data

A semi-serious collection of notes on Data Science, from Statistics to Machine Learning, passing through all sorts of other related things. Not necessarily in the order you'd expect.

Some meta

What exactly is this? Why did I even bother doing this? How did I choose to do this?

  • The about gives a (sorta) detailed explanation from conception till realisation, and the structure and choices;
  • The adminy documents the folders which are supporting material;
  • The resources file lists some data science material which I find brilliant and goes cross-topics, note that references to the single topics are listed in the respective notebooks instead.

Following here is the table of contents with links pointing to the chapters, where you can then proceed to the single topics.

Statistics is at the core of data science, and thourough data analysis is typically the starting point of any modelling effort on data.

  • General Intro
  • Foundational concepts on distributions and measures
  • Hypothesis testing
  • Methods, theorems and laws
  • Notable brain teasers and paradoxes, and how to be careful with data

What is Machine Learning, how you do it and its building blocks. This chapter is strictly linked to the following one on algorithms.

  • Overview of the field
  • Learning algorithms
  • Modelling techniques
  • Dimensionality reduction and matrix factorisation

Discussing algorithms in Machine Learning, one by one.

  • Supervised Learning
  • Unsupervised Learning

How do we measure how good a model we built is?

  • Generic problems models can have
  • Performance Metrics and validation techniques
  • Diagnostics

Artificial neural networlks, whether "shallow" or "deep", deserve their own chapter.

  • What are they and how they work, in general
  • Types of artificial neurons and networks

NLP is the field (a part of Machine Learning) which deals with text, an unstructured data source. What NLP tries to do is putting text into numerical representations, and extracting information from it, in an attempt to do understand (?) it.

  • General concepts and tasks in NLP
  • Manipulating text and extracting information
  • Topic Modelling
  • Word Embeddings

Can a machine view? Well, an image is a matrix so it's all algebraic operations!

  • Intro and quantifying images
  • Processing an image
  • What's in an image

Stuff about what's a machine and how to talk to it in the first place.

  • Data structures and foundational algorithms
  • Foundations of programming

Collecting some reference mathematical results here, for reference.

An overview in a rush of some of the usual suspects used in data science, from programming languages to frameworks and various tools.

  • The Python data stack
  • Databases and distributed frameworks

License

(C) 2017-2018 Martina Pugliese

This work is released under the MIT licence, full info here.

About

A personally elaborated collection of data science notes written entirely in Jupyter notebooks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Other 0.1%