Added calculate_team_stats() function in R/aggregate_team_stats.R #465
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Implemented the calculate_team_stats() function as requested in issue #342. The new function is an adapted version of the exisiting calculate_player_stats() function in R/aggregate_game_stats.R. Each of the columns it built up from the passed in pbp dataframe such that the data is consistent with that of the rest of nflfastR.
Changes Made
Context
This feature addresses the request in issue #342. As per this request thread, the values are built ground-up from nflfastR pbp data. In addition, one of the main points in this thread is to incorporate drive-specific statistics. This is implemented in this PR and covered in the details section.
Function Details
The new calculate_team_stats() function was built as an adaptation of the existing calculate_player_stats() function, which uses the "dplyr" package to manipulate the passed-in pbp dataframe and aggregate player-specific statistics. In the new function, this is done on the team level. Most of the columns built by the calculate_team_stats() function are the same as those built by the calculate_player_stats() with a few additions and exceptions which are listed below:
Column Changes
Testing
As of now, I have only done manual testing using pro-football-reference statistics as ground truth numbers. Excluding discrepancies caused by my implementation details (only notable one is PFR uses team passing yards as passing_yards - sack_yards which I did NOT do) there are only a couple of things to note. For 2023 data, all of the numbers from my initial observations appear to be correct except the per-drive statistics (though this may also be due to how I decided to calculate them vs how PFR does). In older seasons (like 1999), slight discrepancies have appeared. I have done numerous reviews of my own code and am continuing to search for possible bugs.
Notes for Reviewers