Skip to content

michael1-0/epl-match-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

EPL Match Prediction

A machine learning project that predicts English Premier League match outcomes using Random Forest and Gradient Boosting classifiers. The model predicts whether a team will win based on historical match statistics and rolling performance metrics.

Overview

This project uses historical EPL match data to build predictive models that forecast match outcomes. The approach combines:

  • Feature Engineering: Converting categorical data and extracting time-based features
  • Rolling Averages: Capturing team form using 3-match rolling statistics
  • Ensemble Methods: Random Forest and Gradient Boosting classifiers
  • Dual Perspective Analysis: Merging predictions from both teams to identify high-confidence predictions

Project Structure

epl-match-prediction/
├── prediction.ipynb    # Main Jupyter notebook with ML pipeline
├── matches.csv         # EPL match data (not included)
├── requirements.txt    # Python dependencies list
└── README.md

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/epl-match-prediction.git
    cd epl-match-prediction
  2. Create a virtual environment

    python -m venv .venv
    source .venv/bin/activate  
  3. Install dependencies

    pip install -r requirements.txt
  4. Add the dataset

    Place a matches.csv file in the project root directory. The data can be obtained from FBRef (Premier League match logs). The dataset should contain EPL match data with the following columns:

    • date: Match date
    • time: Match start time
    • venue: Home or Away
    • team: Team name
    • opponent: Opponent team name
    • result: Match result (W/D/L)
    • gf, ga: Goals for/against
    • sh, sot: Shots, shots on target
    • dist: Average shot distance
    • fk: Free kicks
    • pk, pkatt: Penalties scored, penalty attempts

Data Source

Match data sourced from FBRef, which provides comprehensive football statistics and match data.

Features

Input Features

Feature Description
venue_code Home (0) vs Away (1)
opponent_code Numeric code for opponent team
hour Match start hour
day_code Day of week (0=Monday, 6=Sunday)

Rolling Statistics (3-match window)

Feature Description
gf_rolling Goals scored
ga_rolling Goals conceded
sh_rolling Total shots
sot_rolling Shots on target
dist_rolling Average shot distance
fk_rolling Free kicks
pk_rolling Penalties scored
pkatt_rolling Penalty attempts

Models

1. Random Forest Classifier

  • Configuration: 50 estimators, min_samples_split=10
  • Purpose: Baseline model with rolling features

2. Gradient Boosting Classifier

  • Configuration: 100 estimators, learning_rate=0.1, max_depth=3
  • Enhancement: Probability thresholding (≥55%) for confident predictions

Methodology

  1. Data Preprocessing

    • Convert dates and extract temporal features
    • Encode categorical variables (venue, opponent)
    • Create binary target variable (Win=1, Not Win=0)
  2. Feature Engineering

    • Calculate 3-match rolling averages using closed='left' to prevent data leakage
    • Normalize team names for consistent merging
  3. Training Strategy

    • Time-based split at January 1, 2022 (not random) to simulate real prediction scenarios
    • Train on historical data, predict future matches
    • 2021-2022+ season data was used for testing, seen in the prediction.ipynb
  4. Confidence Filtering

    • Merge home and away perspectives for each match
    • Filter for matches where Team A is predicted to win AND Team B is predicted to lose
    • These "confident" predictions yield higher precision

Evaluation

The primary evaluation metric is Precision - optimizing for confident win predictions:

$$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}$$

Usage

Open the Jupyter notebook and run all cells:

jupyter notebook prediction.ipynb

Or in VS Code with the Jupyter extension installed.

About

Predicting English Premier League match outcomes using Random Forest and Gradient Boosting classifiers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors