Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect season-weeks in nflreadr::load_depth_charts(seasons = TRUE) #80

Closed
2 tasks done
JackLich10 opened this issue Apr 17, 2024 · 2 comments
Closed
2 tasks done

Comments

@JackLich10
Copy link

JackLich10 commented Apr 17, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Have you installed the latest development version of the package(s) in question?

  • I have installed the latest development version of the package.

If this is a data issue, have you tried clearing your nflverse cache?

I have cleared my nflverse cache and the issue persists.

What version of the package do you have?

1.4.0

Describe the bug

In 2001, week 5 and game_type = POST returns data for 31/32 teams, haven't checked which team is missing. Then, starting in 2007 with NE/NYG, week 5 and game_type = POST is the super bowl team's depth charts (i.e. all the way through 2023 where both SF/KC are charted this way, except 2014 seems to be without this bug).

In addition, the season weeks are just weird:

  • 2001-04: weeks 1-21 are charted (correctly) but the playoff weeks have every team, not just the playoff teams
  • 2005: data ends week 17
  • 2006, 2014: seems right (weeks 1-21 charted and only teams in playoffs for 18-21)
  • 2007-13, 2015-2020: week 18 is charted as like an extra fake regular season week for all teams, which pushes the subsequent weeks back one (i.e. the playoffs are now wrongly weeks 19-22)
  • 2021-present: same thing as the bullet above except now the extra fake regular season week is week 19 and playoffs weeks 20-23

Reprex

r
library(nflreadr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

dc <- nflreadr::load_depth_charts(seasons = TRUE)

dplyr::count(subset(dc, week == 5 & game_type == "POST"), season, club_code) %>% View()
#> Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org

subset(dplyr::count(dc, season, week), week >= 17) %>% 
  ggplot(aes(week, n)) +
  geom_col() +
  facet_wrap(~season)

Created on 2024-04-17 with reprex v2.1.0



### Expected Behavior

Would expect that depth charts should be charted only for the season-weeks that a team is actually playing in. If that is not possible for data reasons/issues/etc., would at least expect that the data be formatted the same across each season.

### nflverse_sitrep

```r
── System Info ────────────────────────────────────
• R version 4.1.2 (2021-11-01) • Running under: macOS 13.5
── Package Status ─────────────────────────────────
   package installed  cran        dev   behind
1   nfl4th     1.0.4 1.0.4 1.0.4.9002      dev
2 nflfastR     4.6.1 4.6.1 4.6.1.9008      dev
3 nflplotR     1.2.0 1.3.1      1.3.1 cran;dev
4 nflreadr     1.4.0 1.4.0   1.4.0.12      dev
5 nflseedR     1.2.0 1.2.0 1.2.0.9000      dev
── Package Options ────────────────────────────────
• No options set for above packages
── Package Dependencies ───────────────────────────
• askpass     (1.2.0)    • gtable     (0.3.4)       • proto        (1.0.0)    
• backports   (1.4.1)    • hms        (1.1.3)       • purrr        (1.0.2)    
• cachem      (1.0.8)    • httr       (1.4.7)       • R6           (2.5.1)    
• cli         (3.6.2)    • isoband    (0.2.7)       • rappdirs     (0.3.3)    
• codetools   (0.2-19)   • janitor    (2.2.0)       • RColorBrewer (1.1-3)    
• colorspace  (2.1-0)    • jsonlite   (1.8.8)       • Rcpp         (1.0.12)   
• compiler    (4.1.2)    • labeling   (0.4.3)       • rlang        (1.1.3)    
• cpp11       (0.4.7)    • lattice    (0.22-5)      • scales       (1.3.0)    
• curl        (5.2.0)    • lifecycle  (1.0.4)       • snakecase    (0.11.1)   
• data.table  (1.14.10)  • listenv    (0.9.1)       • splines      (4.1.2)    
• digest      (0.6.34)   • lubridate  (1.9.3)       • stats        (4.1.2)    
• dplyr       (1.1.4)    • magick     (2.8.2)       • stringi      (1.8.3)    
• fansi       (1.0.6)    • magrittr   (2.0.3)       • stringr      (1.5.1)    
• farver      (2.1.1)    • MASS       (7.3-60.0.1)  • sys          (3.4.2)    
• fastmap     (1.1.1)    • Matrix     (1.3-4)       • tibble       (3.2.1)    
• fastrmodels (1.0.2)    • memoise    (2.0.1)       • tidyr        (1.3.1)    
• furrr       (0.3.1)    • methods    (4.1.2)       • tidyselect   (1.2.0)    
• future      (1.33.1)   • mgcv       (1.8-38)      • timechange   (0.3.0)    
• generics    (0.1.3)    • mime       (0.12)        • tools        (4.1.2)    
• ggpath      (1.0.1)    • munsell    (0.5.0)       • utf8         (1.2.4)    
• ggplot2     (3.4.4)    • nlme       (3.1-153)     • utils        (4.1.2)    
• globals     (0.16.2)   • openssl    (2.1.1)       • vctrs        (0.6.5)    
• glue        (1.7.0)    • parallel   (4.1.2)       • viridisLite  (0.4.2)    
• graphics    (4.1.2)    • parallelly (1.36.0)      • withr        (3.0.0)    
• grDevices   (4.1.2)    • pillar     (1.9.0)       • xgboost      (1.7.7.1)  
• grid        (4.1.2)    • pkgconfig  (2.0.3)         
• gsubfn      (0.7)      • progressr  (0.14.0)        
── Not Installed ──────────────────────────────────
• nflverse ()
───────────────────────────────────────────────────

Screenshots

No response

Additional context

No response

@john-b-edwards
Copy link
Contributor

A bunch of good flags in here, I'll try to tackle them point by point.

In 2001, week 5 and game_type = POST returns data for 31/32 teams, haven't checked which team is missing.

The NFL had only 31 teams in 2001, so it's expected that nflreadr::load_depth_charts(2001) returns just 31 teams.

Starting in 2007 with NE/NYG, week 5 and game_type = POST is the super bowl team's depth charts (i.e. all the way through 2023 where both SF/KC are charted this way, except 2014 seems to be without this bug)

Looks like there's some inconsistency with how the NFL reports depth charts in the postseason. Sometimes they'll provide depth charts for the bye week before the super bowl, other times they won't (like they do for 2002-2006, 2014). Because there's not a game those weeks (and hence game_type = ???), but there are sometimes differences in the depth charts reported for the bye week prior to the super bowl versus the actual super bowl depth charts, I'm going to keep the bye week depth charts in there with game_type = "SBBYE" & week = NA_integer_ and week == 5 & game_type == "POST" is going to become week == 23/24 & game_type == "SB".

2001-04: weeks 1-21 are charted (correctly) but the playoff weeks have every team, not just the playoff teams

The NFL API has these for all teams for some reason. While it is weird, I'm not sure it's a significant issue that they're in there, and in the interest of keeping data and making it available rather than dropping it, I'm going to leave them in there.

2005: data ends week 17

The NFL has not provided postseason depth charts for the 2005 season. This is an issue on the NFL's end, not with the existing depth charts code.

2007-13, 2015-2020: week 18 is charted as like an extra fake regular season week for all teams, which pushes the subsequent weeks back one (i.e. the playoffs are now wrongly weeks 19-22)
2021-present: same thing as the bullet above except now the extra fake regular season week is week 19 and playoffs weeks 20-23

Definitely unusual, not sure why the NFL provides a week 18/19 for these seasons (and moreso why it's not provided for 2014). I'm going to keep these in the dataset as week = 18/19, game_type = "REG", but will fix the postseason weeks.


With these changes, I think that for each season, week, game_type in load_schedules(), there should be a corresponding record in load_depth_charts(). Thanks for the flags, super helpful, please continue to let me know if you come across anything else with the depth charts that looks off. Will merge a PR shortly and rebuild the full files.

@john-b-edwards
Copy link
Contributor

Everything looks aligned, aside from the aforementioned 2005 playoffs:

nflreadr::load_schedules(2001:2023) |>
    dplyr::distinct(season, week, game_type) |>
    dplyr::left_join(
        nflreadr::load_depth_charts(2001:2023) |>
            dplyr::distinct(season, week, game_type) |>
            dplyr::mutate(dc_exists = 1),
        by=c("season","week","game_type")
    ) |>
    dplyr::filter(is.na(dc_exists))
#> ── nflverse games and schedules ────────────────────────────────────────────────
#> ℹ Data updated: 2024-08-04 15:44:30 PDT
#> # A tibble: 4 × 4
#>   season  week game_type dc_exists
#>    <int> <int> <chr>         <dbl>
#> 1   2005    18 WC               NA
#> 2   2005    19 DIV              NA
#> 3   2005    20 CON              NA
#> 4   2005    21 SB               NA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants