chickenstats
is a Python package for scraping & analyzing sports data. With just a few lines of code:
- Scrape & manipulate data from various NHL endpoints, leveraging
chickenstats.chicken_nhl
, which includes a proprietary xG model for shot quality metrics - Augment play-by-play data & generate custom aggregations from raw csv files downloaded from
Evolving-Hockey (subscription required) with
chickenstats.evolving_hockey
For more in-depth explanations, tutorials, & detailed reference materials, consult the Documentation.
chickenstats
requires Python 3.10 or greater & runs on the latest stable versions of Linux, macOS, & Windows
operating systems.
Very simple - install using PyPi. Best practice is to develop in an isolated virtual environment (conda or otherwise), but who's a chicken to judge?
pip install chickenstats
To confirm installation & confirm the latest version (1.7.8):
pip show chickenstats
chickenstats
is structured as two underlying modules, each used with different data sources:
chickenstats.chicken_nhl
chickenstats.evolving_hockey
The package is under active development - features will be added or modified over time.
The chickenstats.chicken_nhl
module scrapes & manipulates data directly from various NHL endpoints,
with outputs including schedule & game results, rosters, & play-by-play data.
The below example scrapes the schedule for the Nashville Predators, extracts the game IDs, then scrapes play-by-play data for the first ten regular season games.
from chickenstats.chicken_nhl import Season, Scraper
# Create a Season object for the current season
season = Season(2023)
# Download the Nashville schedule & filter for regular season games
nsh_schedule = season.schedule('NSH')
nsh_schedule_reg = nsh_schedule.loc[nsh_schedule.game_state == "OFF"].reset_index(drop=True)
# Extract game IDs, excluding pre-season games
game_ids = nsh_schedule_reg.game_id.tolist()[:10]
# Create a scraper object using the game IDs
scraper = Scraper(game_ids)
# Scrape play-by-play data
play_by_play = scraper.play_by_play
The chickenstats.evolving_hockey
module manipulates raw csv files downloaded from
Evolving-Hockey. Using their original shifts & play-by-play data, users can add additional
information & aggregate for individual & on-ice statistics,
including high-danger shooting events, xG & adjusted xG, faceoffs, & changes.
import pandas as pd
from chickenstats.evolving_hockey import prep_pbp, prep_stats, prep_lines
# The prep_pbp function takes the raw event and shifts dataframes
raw_shifts = pd.read_csv('./raw_shifts.csv')
raw_pbp = pd.read_csv('./raw_pbp.csv')
play_by_play = prep_pbp(raw_pbp, raw_shifts)
# You can use the play_by_play dataframe in various aggregations
# These are individual game statistics, including on-ice & usage,
# accounting for teammates & opposition on-ice
individual_game = prep_stats(play_by_play, level='game', teammates=True, opposition=True)
# These are game statistics for forward-line combinations, accounting for opponents on-ice
forward_lines = prep_lines(play_by_play, level='game', position='f', opposition=True)
chickenstats
wouldn't be possible without the support & efforts of countless others. I am obviously
extremely grateful, even if there are too many of you to thank individually. However, this chicken will do his best.
First & foremost is my wife - the lovely Mrs. Chicken has been patient, understanding, & supportive throughout the countless hours of development, sometimes to her detriment.
Sincere apologies to the friends & family that have put up with me since my entry into Python, programming, & data analysis in January 2021. Thank you for being excited for me & with me throughout all of this, especially when you've had to fake it...
Thank you to the hockey analytics community on (the artist formerly known as) Twitter. You're producing & reacting to cutting-edge statistical analyses, while providing a supportive, welcoming environment for newcomers. Thank y'all for everything that you do. This is by no means exhaustive, but there are a few people worth calling out specifically:
- Josh & Luke Younggren (@EvolvingWild)
- Bryan Bastin (@BryanBastin)
- Max Tixador (@woumaxx)
- Micah Blake McCurdy (@IneffectiveMath)
- Prashanth Iyer (@iyer_prashanth)
- The Bucketless (@the_bucketless)
- Shayna Goldman (@hayyyshayyy)
- Dom Luszczyszyn (@domluszczyszyn)
I'm also grateful to the thriving community of Python educators & open-source contributors on Twitter. Thank y'all for your knowledge & practical advice. Matt Harrison (@mharrison) deserves a special mention for his books on Pandas and XGBoost, both of which are available at his online store. Again, not exhaustive, but others worth thanking individually:
- Will McGugan (@willmcgugan)
- Rodrigo Girão Serrão (@mathsppblog)
- Mike Driscoll (@driscollis)
- Trey Hunner (@treyhunner)
- Pawel Jastrzebski (@pawjast)
Finally, this library depends on a host of other open-source packages. chickenstats
is possible because of the efforts
of thousands of individuals, represented below: