-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
07ed094
commit 2d426f4
Showing
1 changed file
with
50 additions
and
90 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,131 +1,91 @@ | ||
# nascaR.data <img src="inst/images/hex-logo.png" alt="nascaR.data Logo" align="right" height="130"/> | ||
|
||
# nascaR.data | ||
|
||
<img src="inst/images/hex-logo.png" alt="nascaR.data Logo" width="200" height="auto"/> | ||
|
||
[![R-CMD-check](https://img.shields.io/badge/R--CMD--check-passing-brightgreen)](https://github.com/kyleGrealis/nascaR.data/actions) | ||
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) | ||
[![NASCAR Data Update](https://github.com/kyleGrealis/nascaR.data/actions/workflows/weekly-nascar-update.yml/badge.svg)](https://github.com/kyleGrealis/nascaR.data/actions/workflows/weekly-nascar-update.yml) | ||
---- | ||
|
||
**nascaR.data** is a curated group of datasets across NASCAR's top three series: Cup, Xfinity, and Trucks. There are 21 sets available to explore & use for creating tables or other data visualizations. | ||
> ⚠️ **Version Notice**: The version on CRAN contains race data through the 2023 season. This GitHub version includes automated weekly updates for the 2024 season and will be synchronized with CRAN by December 1, 2024. | ||
Install `nascaR.data` with the official CRAN release: | ||
**nascaR.data** provides historical race results from NASCAR's top three series: Cup (1949-present), Xfinity (1982-present), and Trucks (1995-present). Data is automatically updated every Monday at midnight during race season. | ||
|
||
``` | ||
## Installation | ||
|
||
Install the stable CRAN version (through 2023 season): | ||
```r | ||
install.packages('nascaR.data') | ||
``` | ||
``` | ||
# or | ||
remotes::install_cran('nascaR.data') | ||
``` | ||
|
||
Or you can install the development verion that contains more-frequently updated race results: | ||
Install the development version with weekly 2024 updates: | ||
|
||
``` | ||
remotes::install_github('kyleGrealis/nascaR.data') | ||
``` | ||
|
||
|
||
---- | ||
|
||
This package is a collection of NASCAR race, driver, owner and manufacturer data across the three major NASCAR divisions: NASCAR Cup Series, NASCAR Xfinity Series, and NASCAR Craftsman Truck Series. The curated data begins with the 1949 season and extends through the end of the 2023 season. Data was sourced with permission from DriverAverages.com. | ||
|
||
---- | ||
--- | ||
|
||
## In the Pits | ||
|
||
NASCAR is one of the top-tier racing sports in North America and competes against F1 and IndyCar for the top viewership spot. Approximately 3.22 million people watch a race on any given weekend throughout the season. The `nascaR.data` package is the result of wanting to share a passion for the sport and provide an option to the typical go-to packages when learning new data visualization tools. | ||
NASCAR is one of the top-tier racing sports in North America and competes against F1 and IndyCar for the top viewership spot. Approximately 3.22 million people watch a race on any given weekend throughout the season. The nascaR.data package is the result of wanting to share a passion for the sport and provide an option to the typical go-to packages when learning new data visualization tools. | ||
|
||
`nascaR.data` is packed full of NASCAR results dating back to the first Daytona Beach race in 1949! Use this package to discover race trends across the NASCAR Cup Series, Xfinity Series, and Craftsman Truck Series. Answer fun questions like "which driver has accumulated the most wins overall?", "which owner has the best top 10 percentage at Daytona?", or see which manufacturer has dominated which series in a certain season. It's all here, so let's strap in to our race seats, fire up those engines, and let's take some warm-up laps. | ||
## Data Structure | ||
|
||
## Warming up the tires | ||
The package provides three main datasets: | ||
|
||
`nascaR.data` provides access to 21 different datasets (7 per series) and are broken down by overall race results and driver, owner, and manufacturer season & career records. Let's check our gauges and see what's under the hood: | ||
* `cup_series`: NASCAR Cup Series race results (1949-present) | ||
* `xfinity_series`: NASCAR Xfinity Series race results (1982-present) | ||
* `truck_series`: NASCAR Craftsman Truck Series results (1995-present) | ||
|
||
```{r, echo=TRUE} | ||
library(nascaR.data) | ||
``` | ||
Each dataset contains detailed race information including: | ||
|
||
Use `?nascaR.data::cup_race_data` to view a list of variable descriptions. This package has been designed to swap `cup` for `xfinity` or `truck` to see the same data structure (variables) for the respective series. Would you rather inspect driver-specific results listed by season or their overall career? No problem... this is an easy pit stop: `cup_driver_career` or `xfinity_owner_season` or `truck_mfg_overall`. | ||
* Race details (Season, Race number, Track, Name) | ||
* Results (Finish position, Start position) | ||
* Performance metrics (Laps completed, Laps led, Points earned) | ||
* Driver and team information | ||
|
||
## Green Flag! | ||
Data is sourced with permission from DriverAverages.com and is automatically updated every Monday at midnight during the racing season (February-November). | ||
|
||
**Which drivers are in the Top 5 for wins in the NASCAR Cup Series?** | ||
## Usage | ||
|
||
First, organize the drivers in descending order by win. Then, subset to keep the Top 5 winningest drivers. Lastly, feed the data into a horizontal bar chart (some other tweaks will be applied to enhance the visual output). | ||
Load the package: | ||
|
||
```{r, echo=TRUE, eval=FALSE, warning=FALSE} | ||
cup_driver_career |> | ||
arrange(desc(career_wins)) |> | ||
slice_head(n = 5) |> | ||
ggplot(aes(driver, career_wins)) + | ||
geom_bar(stat = 'identity') + | ||
coord_flip() | ||
``` | ||
|
||
![NASCAR Top 5 wins](inst/images/nascar-top-5.png) | ||
|
||
Wow! This doesn't even look like a close race. Richard Petty clearly leads the field with 200 wins. However, let's take a drive a little deeper into the turn and account for the number of races each driver competed in. What if we compare these same five drivers by win percentage? | ||
|
||
```{r, echo=TRUE, eval=FALSE, warning=FALSE} | ||
cup_driver_career |> | ||
arrange(desc(career_wins)) |> | ||
slice_head(n = 5) |> | ||
ggplot(aes(driver, career_win_pct)) + | ||
geom_bar(stat = 'identity') + | ||
coord_flip() | ||
library(nascaR.data) | ||
``` | ||
|
||
![NASCAR Top 5 percent wins](inst/images/nascar-top-5-pct.png) | ||
|
||
## The Garage Area | ||
|
||
**Which manufacturer has the best win percentage by season?** | ||
View the dataset documentation: | ||
|
||
Let's go behind the pits and see what the manufacturers are up to in the Truck Series. | ||
|
||
```{r, eval=FALSE, warning=FALSE} | ||
truck_mfg_season |> | ||
ggplot(aes(season, mfg_season_win_pct, group = manufacturer, color = manufacturer)) + | ||
geom_line() + | ||
geom_point() | ||
``` | ||
?cup_series | ||
?xfinity_series | ||
?truck_series | ||
``` | ||
|
||
![NASCAR Truck manufacturer win percent by season](inst/images/truck-mfg.png) | ||
|
||
No clear trend emerges, though it appears that there may be a 5-year clustering of winning percentage. For example, the Dodges experienced success in the early 2000s, but started to fall off before exiting the series. And while Ford has seemingly had gradual improvement, you can clearly see the success of the Toyota camp since joining the Truck series in 2004. | ||
|
||
## Post-race | ||
## The Backstretch | ||
This package provides rich historical data for: | ||
|
||
**Collect your race winnings** | ||
* Analyzing race trends across series | ||
* Comparing driver performances | ||
* Creating visualizations of NASCAR statistics | ||
|
||
How has the average money for winning a race changed over time? | ||
### Helper Functions | ||
|
||
```{r, eval=FALSE,, warning=FALSE} | ||
cup <- cup_race_data |> | ||
mutate(series = 'Cup') |> | ||
filter(finish == 1) |> | ||
select(season, race, finish, money, series) | ||
The package includes convenient functions to find driver, team, and manufacturer results: | ||
|
||
xfinity <- xfinity_race_data |> | ||
mutate(series = 'Xfinity') |> | ||
filter(finish == 1) |> | ||
select(season, race, driver, money, series) | ||
### Driver Information | ||
|
||
truck <- truck_race_data |> | ||
mutate(series = 'Truck') |> | ||
filter(finish == 1) |> | ||
select(season, race, driver, money, series) | ||
Get race results for a specific driver: | ||
|
||
bind_rows(cup, xfinity, truck) |> | ||
group_by(series, season) |> | ||
summarize(mean_money = mean(money, na.rm = TRUE)) |> | ||
ggplot(aes(season, mean_money, group = series, color = series)) + | ||
geom_point() + | ||
geom_line() | ||
``` | ||
get_driver_info("Kyle Larson") # or | ||
get_driver_info("kyle larson") | ||
``` | ||
|
||
![NASCAR Win money by season](inst/images/nascar-money.png) | ||
|
||
Race winnings in the Cup series experienced exponential growth beginning in the 1980s while Xfinity and Truck Series winnings have remained relatively the same since 2000. | ||
|
||
## The Backstretch | ||
or search by race team or manufacturer: | ||
|
||
I hope this gives you a little taste of what is included in this package. There's plenty of opportunity to further clean and reshape the data for data visualizations or model prepping. I'll be adding more data throughout the season. | ||
``` | ||
get_team_info("Petty Enterprises") | ||
get_manufacturer_info("Toyota") | ||
``` |