University of Idaho - Department of Computer Science
Instructor: Alex Vakanski
Semester: Fall 2022 (August 22 – December 16)
- Lecture 2 - Data Types in Python
- Lecture 3 - Statements, Files
- Lecture 4 - Functions, Iterators, Generators
- Lecture 5 - Object-Oriented Programming
- Tutorial 1 - Python IDE, Jupyter Notebook
- Lecture 6 - Exceptions, Modules, Packages
- Lecture 7 - Decorators
- Lecture 8 - Functional Programming, Callbacks, Closures
- Lecture 9 - NumPy for Array Operations
- Lecture 10 - Data Manipulation with Pandas
- Lecture 11 - Data Visualization with Matplotlib
- Lecture 13 - Data Exploration and Preprocessing
- Lecture 14 - Feature Engineering
- Lecture 24 - Databases and SQL
- Tutorial 9 - Web Scraping
- Lecture 12 - Scikit-Learn Library for Data Science
- Lecture 15 - Ensemble Methods
- Tutorial 4 - TensorFlow
- Lecture 16 - Convolutional Neural Networks with Keras and TensorFlow
- Tutorial 5 - PyTorch
- Lecture 17 - Convolutional Neural Networks with PyTorch
- Tutorial 7 - TensorFlow Datasets
- Lecture 18 - Natural Language Processing
- Lecture 19 - Transformer Networks
- Lecture 20 - Language Models with Hugging Face
- Lecture 21 - Model Selection, Hyperparameter Tuning
- Lecture 22 - Diffusion Models for Text-to-Image Generation
- Lecture 25 - Deploying Projects as Web Applications
- Lecture 26 - Deploying Projects to the Cloud
- Lecture 27 - Introduction to Data Science Operations (DSOps)
- Tutorial 10 - Virtual Environments
- Lecture 28 - Reproducible Data Science Projects
- Tutorial 11 - CometML
- Lecture 29 - Monitoring Performance
- Lecture 30 - Continuous Deployment
With the increased use of data science projects for improving various functions and operations across organizations, the tools for managing such projects have matured as well. This course introduces students to Python tools and libraries that are commonly used by organizations for management of the different phases in the life cycle of data science projects. The content is divided into four main themes. The first theme reviews the basics of Python programming and extends it with advanced concepts. The second theme focuses on data engineering, and covers Python tools for data collection and exploration. The next theme overviews model engineering, and includes model design, training, testing, optimization, and packaging. The last theme introduces Data Science Operations (DSOps), and covers techniques for model serving, performance monitoring, diagnosis, and reproducibility of data science projects deployed in production.
- Joel Grus, “Data Science from Scratch: First Principles with Python,” 2nd Edition, O'Reilly Media, 2019, ISBN: 9781492041139.
- Chip Huyen, “Designing Machine Learning Systems,” O'Reilly Media, 2022, ISBN: 9781098107963.
Upon the completion of the course, the students should demonstrate the ability to:
- Understand and describe commonly used Python frameworks for life cycle management of data science projects.
- Apply advanced Python tools for data collection, analysis, and visualization.
- Design, validate, and justify the selection of data science models using statistical approaches, data mining, and machine learning methods.
- Implement algorithms for image and natural language processing using Python-based frameworks.
- Understand the main characteristics of existing Python libraries for deployment, continuous integration, and monitoring of data science projects.
- Deploy data science models on cloud servers and edge devices.