Skip to content

LioCharity/Algorithmic-Machine-Learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Algorithmic Machine Learning

This repository contains Jupyter Notebooks for the Algorithmic Machine Learning Course at Eurecom.

Objectives of the course

The goal of this course is mainly to offer students to gain hands-on experience on Data Science projects. It nicely merges the theoretical concepts students can learn in our courses on machine learning and statistical inference, and systems concepts we teach in distributed systems.

Notebooks require to address several challenges, that can be roughly classified in:

  • Data preparation and cleaning
  • Building descriptive statistics of the data
  • Working on a selected algorithm, e.g., for building a statistical model, for running parallel Monte Carlo simulations and so on...
  • Working on experimental validation

Technical notes

This repository contains what we could call "solved exercises". Students will work on a different version of each notebook, that requires answering questions, writing code, and commenting results.

Students will use the EURECOM cloud computing platform to work on Notebooks. Our cluster is managed by Zoe, which is a container-based analytics-as-a-service system we have built. Notebooks front-end run in a user-facing container, whereas Notebooks kernel run in clusters of containers. For this course, we focus on Apache Spark kernel.

Our Notebooks are configured to use Spark Python API. This choice is motivated by the heterogeneity we expect from our students' background.

Sources and acknowledgments

The majority of the Notebooks we use in our lectures are based on use cases illustrated in the book Advanced Analytics with Spark, by Sandy Ryza, Uri Laserson, Sean Owen & Josh Wills.

Some Notebooks are instead based on publicly available data, for which we defined the tasks to complete.

Finally, some Notebooks are private, and cannot be pushed to this repository. This is the case for industrial Notebooks, taking the form of use cases by Data Scientists from companies we are in contact with.

Finally, all this could not be achieved without the skills of Duc-Trung Nguyen, PhD student in my group at Eurecom.

About

Public course material

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 62.1%
  • HTML 36.0%
  • TeX 1.9%