INFSCI 1540 Project Report - American Football Quarterbacks

What is this?

Our project provides a database of quarterback performances per game so answer questions such as: Who is the highest performing quarterback in N seasons? Which QB is the best performing average? Do I draft CJ Stroud?

What data are we maintaining in our system?

Our data engineering system maintains a list of NFL quarterbacks over the last 4 years, their current teams, the games they played in, and their standard fantasy football score each game.(.04passings yard + .1rushing yards + 4passing touchdowns+.6rushing touchdowns - 2*fumbles)

We seperated games by weeks because its how NFL seasons are split and organized the weeks of the 2020-2023 seasons into 1 table so the weeks do not overlap.

Importantly, we left in a lot of "empty" data, as it in itself is it's own statistic. Using this data we can find statistics on how often players were actually played in a series, and for more specific averages per weeks/matches played.

Our docker containers

Php Web Server - General DB Management
Mysql ODB Server - Our operational database to store our tables
Mysql DW Server - Our data warehouse holding onto our fact table and player performance information
ODB Phpmyadmin Server - Allows us to interact with the ODB sql server
DW Phpmyadmin Server - Allows us to interact with the DW sql server
Kafka Broker - Allows our producers and consumers to interact with each other
Zookeeper - Keeps track of our current broker and tells producers and consumers to go through our broker

The data loading process

Our data loading process involved organizing our data to be used. Each player and team was given a table. Each week in the past 4 seasons was given its own id. We then normalized these into one table that held each player's NAME TEAM and PLAYER PERFORMANCE(fantasy score) for EACH WEEK in the past four seasons and cross referenced this against a different dataset to find which teams the player played against. Then, using kafka, we uploaded the tables to the ODB and created a new table which gave each game that was played by someone its own row. Our data warehouse would then mirror our performance per week dataset.

Star Diagram

erDiagram
  PLAYER }|--|| PPpW: Dimension
  PLAYER {
    int TeamId
    string Name
    string Pos
  }
  PPpW {
    int PlayerID
    int TeamID
    int OppTeamID
    int WeekID
    double fScore
  }
  FACT {
    int PlayerID
    int TeamID
    int OppTeamID
    int WeekID
    double Score
  }
  TEAM }|--|| PPpW: Dimension
  TEAM {
    string Name
    string Abreviation
    string Conference
    string Division
  }
  WEEK }|--|| PPpW: Dimension
  WEEK {
    int WeekNumber
    int Season
  }
  PPpW}|--|| Lifetime: ODB-to-DW_Aggregate
  Lifetime {
    int PlayerID
    double Score
  }
  PPpW }|--|{ FACT: ODB-to-DW
  PPpW}|--|| Vs: ODB-to-DW_Aggregate
  Vs {
    int OppTeamID
    double Score
  }

Data Streaming

The data stream supported in our system was Kafka. We used it to move our data from CSVs to our ODB and our summary table to our DW.

Pre-Aggregated Summary table

Our main summary table is the FACT table. We took the data through our normalized csv then PPpW table and mirrored it to create our fact table. We also made a summary table for the total points each player earned in their Lifetime and the total points have been scored against each team in the past 4 seasons called Vs.

OLAP Queries

Player with the most 40-point games:

Every 40-point game:

Dak Prescott's Lifetime Performance:

Strongest Defensive Team:

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
project_datasets		project_datasets
screenshots		screenshots
.env		.env
README.md		README.md
README.pdf		README.pdf
data_producer.py		data_producer.py
docker-compose.yml		docker-compose.yml
dw_consumer.py		dw_consumer.py
odb_consumer.py		odb_consumer.py
odb_producer.py		odb_producer.py
prepare_dw.py		prepare_dw.py
prepare_odb.py		prepare_odb.py
queries.sql		queries.sql
transformations.py		transformations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INFSCI 1540 Project Report - American Football Quarterbacks

What is this?

What data are we maintaining in our system?

Our docker containers

The data loading process

Star Diagram

Data Streaming

Pre-Aggregated Summary table

OLAP Queries

Thanks for reading! Feel free to demo on your own time!

About

Releases

Packages

Contributors 2

Languages

narekb8/INFSCI-1540-Project

Folders and files

Latest commit

History

Repository files navigation

INFSCI 1540 Project Report - American Football Quarterbacks

What is this?

What data are we maintaining in our system?

Our docker containers

The data loading process

Star Diagram

Data Streaming

Pre-Aggregated Summary table

OLAP Queries

Thanks for reading! Feel free to demo on your own time!

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages