Skip to content

Latest commit

 

History

History
435 lines (347 loc) · 25.7 KB

README.md

File metadata and controls

435 lines (347 loc) · 25.7 KB

Machine Learning Practice

Some practices using statistical machine learning technique based on some dataset.

To see more detail or example about deep learning, you can checkout my Deep Learning repository.

Because Github don't support LaTeX for now, you can use the Google Chrome extension TeX All the Things (github) to read the notes.

Environment

  • Using Python 3

(most of the relative path links are according to the repository root)

Dependencies

  • numpy: For low-level math operations
  • pandas: For data manipulation
  • sklearn - Scikit Learn: For evaluation metrics, some data preprocessing

For comparison purpose

For visualization

  • Mlxtend
  • matplotlib
    • matplotlib.pyplot
    • mpl_toolkits.mplot3d

For evaluation

  • surprise: A Python scikit building and analyzing recommender systems

NLP related

  • gensim: Topic Modelling
  • hmmlearn: Hidden Markov Models in Python, with scikit-learn like API
  • jieba: Chinese text segementation library
  • pyHanLP: Chinese NLP library (Python API)
  • nltk: Natural Language Toolkit

Projects

Subject Technique / Task Dataset Solution Notes
Letter Recognition kNN / Classification Letter Recognition Datasets (File) kNN From Scratch, kNN Scikit Learn Notes
Page Blocks Classification Decision Tree / Classification Page Blocks Classification Data Set (File) Decision Tree (CART) From Scratch, Decision Tree Scikit Learn Notes
CSM Linear Regression / Regression CSM Dataset (2014 and 2015) (File) Linear Regression From Scratch, Linear Regression Scikit Learn, Linear Regression PyTorch NN Notes
Nursery Naive Bayes / Classification Nursery Data Set (File) Gaussian Naive Bayes From Scratch, Gaussian Naive Bayes Scikit Learn Notes
Post-Operative Patient SVM (cvxopt) / Binary Classification Post-Operative Patient Data Set (File, Simplified) SVM From Scratch (using cvxopt and simplified dataset), SVM Scikit Learn Notes
Student Performance AdaBoost / Classification Student Performance Data Set (File) AdaBoost From Scratch, AdaBoost Scikit Learn Notes
Sales Transactions k-Means / Clustering Sales Transactions Dataset Weekly (File) k-Means From Scratch, k-Means Scikit Learn Notes
Frequent Itemset Mining FP-Growth / Frequent Itemsets Mining Retail Market Basket Data Set (File) FP-Growth From Scratch Notes
Automobile PCA / Dimensionality Reduction Automobile Data Set (File) PCA From Scratch, PCA Scikit Learn Notes
Anonymous Microsoft Web Data SVD / Recommendation System Anonymous Microsoft Web Data Data Set (File, Ratings Matrix (by R)) SVD From Scratch, R Notebook - IBCF Recommender System Notes
Handwriting Digit SVM (SMO) / Binary & Multi-class Classification MNIST (File) Binary SVM From Scratch, Multi-class (OVR) SVM From Scratch Notes
Chinese Text Segmentation HMM (EM) / Text Segmentation & POS Tagging File HMM From Scratch, HMM hmmlearn, Compare with Jieba and HanLP -
Document Similarity and LSI VSM, SVD / LSI Corpus of the People's Daily (File) VSM From Scratch, VSM Gensim, SVD/LSI Gensim Notes
Click and Conversion Prediction Logistic Regression / Recommendation System Ali-CCP (File too large about 20GB) Notes
LightGBM & XGBoost & CatBoost Practice Boosting Tree / Classification Social Network Ads (File) LightGBM, XGBoost Notes
Kaggle Elo LightGBM / Feature Engineering Elo Merchant Category Recommendation LightGBM Project
DCIC 2019 LXGBoost / Feature Engineering Failure Prediction of Concrete Piston for Concrete Pump Vehicles XGBoost Project
Epinions CLiMF Collaborative Filtering / Recommendation System Epinions CLiMF From Scratch, CLiMF TensorFlow Notes, PaperResearch
Iris EM EM Algorithm / Clustering Iris Data Set EM From Scratch Notes
Iris Logistic Logistic Regression / Classification Iris Data Set Logistic Regression From Scratch, Logistic Regression Scikit Learn, SVM (used for compare) Notes

Machine Learning Categories

Consider the learning task

Consider the learning model

  • Discriminative Model
    • Discriminative Function
    • Probabilistic Discriminative Model
  • Generative Model

Cosider the desired output of a ML system

Ensemble Method (Meta-algorithm)

NLP Related

Backbone

Others

Heuristic Algorithm (Optimization Method)

  • SMO --> SVM
  • EM --> HMM, etc.
  • GIS == improved ==> IIS --> MEM

General Case

Categorized

Specific Field

Machine Learning Mathematics

Topic

Categories

  • Linear Algebra
    • Orthogonality
    • Eigenvalues
    • Hessian Matrix
    • Quadratic Form
    • Markov Chain - HMM
  • Calculus
    • Multivariable Deratives
      • Quadratic Approximations
      • Lagrange Multipliers and Constrained Optimization - SVM SMO
      • Lagrange Duality
  • Probability and Statistics

Basics

  • Algebra
  • Trigonometry

Application

(from A to Z)

  • Decision Tree
    • Entropy
  • HMM
    • Markov Chain
  • Naive Bayes
    • Bayes' Theorem
  • PCA
    • Orthogonal Transformations
    • Eigenvalues
  • SVD
    • Eigenvalues
  • SVM
    • Convex Optimization
    • Constrained Optimization
    • Lagrange Multipliers
    • Kernel

Books Recommendation

Machine Learning

Mathematics

  • Linear Algebra with Applications (Steven Leon)
  • Convex Optimization (Stephen Boyd & Lieven Vandenberghe)
  • Numerical Linear Algebra (L. Trefethen & D. Bau III)

Resources

Tutorial

Videos

Documentations

Interactive Learning

MOOC

Github

Textbook Implementation

Datasets

Competition

Global

Taiwan

China

Machine Learning Platform

Machine Learning Tool

(Online) Development Environment

jupyter notebook

  • Extension plugin - pip install jupyter_contrib_nbextensions
    • VIM binding
    • Codefolding
    • ExecuteTime
    • Notify
  • Jupyter Theme - pip install --upgrade jupyterthemes