Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrambles for older years #468

Merged
merged 14 commits into from
Jun 26, 2024
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: nflfastR
Title: Functions to Efficiently Access NFL Play by Play Data
Version: 4.6.1.9009
Version: 4.6.1.9010
Authors@R:
c(person(given = "Sebastian",
family = "Carl",
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
- Fixed a bug in `fixed_drive` and `fixed_drive_result` where the second weather delay in `2023_13_ARI_PIT` wasn't identified correctly. (#461)
- `punter_player_id`, and `punter_player_name` are filled for blocked punt attempts. (#463)
- Fixed an issue affecting scores of 2022 games involving a return touchdown (#466)
- Added identification of scrambles from 1999 through 2004 with thank to Aaron Schatz (#468)

# nflfastR 4.6.1

Expand Down
11 changes: 4 additions & 7 deletions R/helper_add_nflscrapr_mutations.R
Original file line number Diff line number Diff line change
Expand Up @@ -666,8 +666,8 @@ make_model_mutations <- function(pbp) {


fix_scrambles <- function(pbp) {
# skip below code if 2005 is not in the data
if (!2005 %in% pbp$season) return(pbp)
# skip below code if <= 2005 is not in the data
if (min(pbp$season) > 2005) return(pbp)

pbp %>%
dplyr::mutate(
Expand All @@ -677,12 +677,9 @@ fix_scrambles <- function(pbp) {
dplyr::select(-"scramble_id")

# Some notes on the scramble_fix:
# This marks scrambles in the 2005 season using charting data
# This marks scrambles in the 1999 - 2005 season using charting data
# Because NFL did not put scramble in play description during this season
# Data from Football Outsiders (thanks to Aaron Schatz!)
# 2005 season, Weeks 1-16 are based on charting
# 2005 season, Weeks 17-21 are guesses (basically every QB run except those that were a) a loss, b) no gain, or c) on 3/4 down with 1-2 to go).
# Plays nullified by penalty are not included.
# Data from Aaron Schatz!
}

translate_play_type_nfl <- function(play_type_nfl){
Expand Down
Binary file modified R/sysdata.rda
Binary file not shown.
Binary file not shown.
59 changes: 51 additions & 8 deletions data-raw/build_scramble_fix.R
Original file line number Diff line number Diff line change
@@ -1,17 +1,45 @@
library(tidyverse)

pbp <- nflfastR::load_pbp(2005) %>%
dplyr::select(game_id, play_id, week, desc, away_team, home_team, posteam, qtr, down, ydstogo, time)
pbp <- nflfastR::load_pbp(1999 : 2005) %>%
# plays that could plausibly be scramble
filter(
!is.na(rusher_player_id) | penalty == 1,
is.na(passer_player_id),
is.na(receiver_player_id)
) |>
select(season, game_id, play_id, week, away_team, home_team, posteam, qtr, down, ydstogo, time, desc) |>
# not in scramble data this year
mutate(
time = case_when(
nchar(time) == 3 ~ paste0("00", time),
nchar(time) == 4 ~ paste0("0", time),
TRUE ~ time
)
)

# Thank you to Aaron Schatz and Football Outsiders
# For the charting data to fix scrambles in 2005
s <- readxl::read_xlsx("data-raw/scrambles_2005.xlsx") %>%
janitor::clean_names()
as_tibble() |>
janitor::clean_names() %>%
select(
season = year, week, qtr, away_team = away, home_team = home, posteam = offense, down, ydstogo = togo, date_time = time
)

dat <- s %>%
# Thank you to Aaron Schatz
# For the charting data to fix scrambles in 1999 - 2004
s2 <- readxl::read_xlsx("data-raw/Scrambles 1999-2004 UPDATE for NFLfastR.xlsx", sheet = 1) |>
as_tibble() |>
janitor::clean_names() |>
filter(type %in% c("scramble", "assume scramble")) %>%
select(
season = year, week, qtr, away_team = away, home_team = home, posteam = offense, down, ydstogo = togo, date_time = time, desc = description
) %>%
season = year, week, qtr, away_team = away, home_team = home, posteam = offense, down, ydstogo = togo, date_time = time, yards_gained = yards
)

dat <- bind_rows(
s2,
s
) %>%
mutate(
time = paste0(
formatC(lubridate::hour(date_time), width = 2, flag = "0"),
Expand All @@ -24,11 +52,26 @@ dat <- s %>%

d <- dat %>%
dplyr::left_join(
pbp %>% select(game_id, play_id, week, away_team, home_team, posteam, qtr, down, ydstogo, time),
by = c("week", "away_team", "home_team", "posteam", "qtr", "down", "ydstogo", "time")
pbp,
by = c("week", "away_team", "home_team", "posteam", "qtr", "down", "ydstogo", "time", "season")
) %>%
mutate(scramble_id = paste0(game_id, "_", play_id)) %>%
filter(scramble_id != "2005_09_CIN_BAL_1725")

# number non-matched by season
nrow(d)
d |> filter(is.na(desc)) |> group_by(season) |> summarise(n = n())

# get rid of non-match
d <- d |>
filter(!is.na(desc))
d |> group_by(season) |> summarise(n = n())

scramble_fix <- d$scramble_id
scramble_fix <- scramble_fix |>
unique()
length(scramble_fix)
saveRDS(scramble_fix, file = "data-raw/scramble_fix.rds")



Binary file modified data-raw/scramble_fix.rds
Binary file not shown.
Binary file modified tests/testthat/expected_pbp.rds
Binary file not shown.
Loading