This notebook documents our process for extracting natality data from CDC Wonder web application and other sources, such as data.cdc.gov.
All raw, downloaded CDC data are stored in this project’s /data/ folder.
library(here)
library(dplyr)
library(vroom)
library(tidyr)
library(purrr)
Expanded Natality data (2016 - 2021)
Here are the latest technical notes on CDC’s Natality data and the CDC data dictionary.
This dataset is inspired by the Maternal, Infant, and Child Health objective identified by Healthy People 2030 to reduce cesarean births among low-risk women with no prior births to a target of 23.6% nationwide. A low-risk birth is defined as
-
nulliparous: first birth,
-
singleton: a single fetus (not multiple),
-
term: at least 37 weeks of gestation based on obstetric estimate of gestation at delivery, and
-
vertex: not breech / head is facing in a downward position for delivery.
In order to get the totals for low-risk births and low-risk cesarean births we must identify the equivalent criteria in CDC Wonder’s Natality for 2016 - 2021 (expanded) data collection. For low-risk cesarean births, we set the following:
In section 1. “Organize table layout”
- Group Results by: Delivery characteristics - Year
In section 5. “Select pregnancy history and prenatal care characteristics”
- Live Birth Order: 1
In section 10. “Select delivery characteristics”
-
Year: All Years
-
Fetal Presentation: Cephalic
-
Delivery Method Expanded: Primary C-section; Repeat C-section; C-section (unknown if previous c-section)
In section 12. “Select infant characteristics”
-
OE Gestational Age Recode 11: 37-38 weeks; 39 weeks; 40 weeks; 41 weeks; 42 weeks or more
-
Plurality: Single
Use the send button to run the query in CDC Wonder. The export of this dataset is saved as the US Natality, 2016-2021 expanded_low-risk cesarean.txt data file in /data/.
We can use vroom to load this data in R and mutate in formatted version of the counts.
low_risk_cesarean_totals <-
vroom(
here("data", "US Natality, 2016-2021 expanded_low-risk cesarean.txt"),
n_max = 6,
col_types = cols("c", "i", "i", "i"),
delim = "\t"
) %>%
select(Year, Cesarean_births = Births) %>%
as.data.frame()
low_risk_cesarean_totals %>%
mutate(Cesarean_births = prettyNum(Cesarean_births, big.mark = ",")) %>%
arrange(desc(Year))
## Year Cesarean_births
## 1 2021 316,349
## 2 2020 310,303
## 3 2019 314,016
## 4 2018 319,022
## 5 2017 325,086
## 6 2016 329,614
These match the low-risk cesarean totals reported on page 36, Table 17 Jan. 2023 National Vital Statistics Report.
Next, we’ll get total low-risk births so that we can calculate the proportion of low-risk cesareans among these. Note that these totals are not provided in the report linked above but they are the denominator used in the low-risk cesarean rates calculation as stated in footnote 6 of Table 17.
The low-risk birth data is obtained by using the same criteria above with the exception of changing the delivery method to All Methods:
In section 1. “Organize table layout”
- Group Results by: Delivery characteristics - Year
In section 5. “Select pregnancy history and prenatal care characteristics”
- Live Birth Order: 1
In section 10. “Select delivery characteristics”
-
Year: All Years
-
Fetal Presentation: Cephalic
-
Delivery Method Expanded: All Methods
In section 12. “Select infant characteristics”
-
OE Gestational Age Recode 11: 37-38 weeks; 39 weeks; 40 weeks; 41 weeks; 42 weeks or more
-
Plurality: Single
The export of this dataset is the US Natality, 2016-2021 expanded_low-risk births.txt data file stored in /data/. The total counts are displayed by year below.
low_risk_all_deliveries_totals <-
vroom(
here("data", "US Natality, 2016-2021 expanded_low-risk births.txt"),
n_max = 6,
col_types = cols("c", "i", "i", "i"),
delim = "\t"
) %>%
select(Year, Births) %>%
as.data.frame()
low_risk_all_deliveries_totals %>%
mutate(Births = prettyNum(Births, big.mark = ",")) %>%
arrange(desc(Year))
## Year Births
## 1 2021 1,204,358
## 2 2020 1,198,613
## 3 2019 1,226,476
## 4 2018 1,231,332
## 5 2017 1,250,875
## 6 2016 1,280,607
We join the low-risk cesarean and low-risk births data into a single dataset and calculate the national low-risk rates from 2016 to 2021.
df_low_risk_births <-
left_join(
low_risk_all_deliveries_totals,
low_risk_cesarean_totals,
by = "Year"
) %>%
mutate(low_risk_cesarean_rate = Cesarean_births/Births)
df_low_risk_births %>%
mutate(
low_risk_cesarean_rate = scales::percent(low_risk_cesarean_rate, accuracy = .1),
Births = prettyNum(Births, big.mark = ","),
Cesarean_births = prettyNum(Cesarean_births, big.mark = ",")
) %>%
arrange(desc(Year))
## Year Births Cesarean_births low_risk_cesarean_rate
## 1 2021 1,204,358 316,349 26.3%
## 2 2020 1,198,613 310,303 25.9%
## 3 2019 1,226,476 314,016 25.6%
## 4 2018 1,231,332 319,022 25.9%
## 5 2017 1,250,875 325,086 26.0%
## 6 2016 1,280,607 329,614 25.7%
The total low-risk Cesarean births and low-risk Cesarean rates match the low-risk cesarean totals and percentages reported in Table 17, page 36 of the Jan. 2023 National Vital Statistics Report.
This processed dataframe is saved as a csv in /publish/ and RDS in /save/ with the filename “US_low_risk_births_2016_to_2021”.
saveRDS(df_low_risk_births, here("save","US_low_risk_births_2016_to_2021.RDS"))
write.csv(df_low_risk_births, here("publish", "US_low_risk_births_2016_to_2021.csv"), row.names = FALSE)
For the latest rates as of 2023, visit the NCHS - VSRR Quarterly provisional estimates for selected birth indicators dataset on Socrata. This does not include totals but has cesarean and low-risk cesarean birth rates at the national level by race/ethnicity.
In CDC Wonder we use the same criteria for low-risk cesarean births and low-risk births defined above, and add on an additional variable to Section 1. “Organize table layout” to
- Group Results by: Year And By State of Residence
Send this result in CDC Wonder, and the output provides totals for all 50 US states and the District of Columbia from 2016 to 2021. The output files for low-risk births and low-risk cesarean delivery totals are, respectively, the State-level Natality, 2016-2021 expanded_low-risk births.txt and State-level Natality, 2016-2021 expanded_low-risk cesarean.txt data files stored in /data/.
These are combined into a single dataset using vroom
and
purrr::map2_dfr
below. The low-risk cesarean rates for each
jurisdiction are calculated in using a mutate.
state_filenames <- c(
here("data", "State-level Natality, 2016-2021 expanded_low-risk births.txt"),
here("data", "State-level Natality, 2016-2021 expanded_low-risk cesarean.txt")
)
datasets <- c(
"All low-risk births",
"Low-risk cesarean births"
)
df_low_risk_births_by_state <-
map2_dfr(
state_filenames, datasets,
~ vroom(
file = .x,
n_max = 306,
col_types = cols("c", "i", "i", "c", "i", "i")
) %>%
drop_na(`State of Residence`) %>%
select(-Notes, -`Year Code`) %>%
mutate(type = .y)
) %>%
pivot_wider(
names_from = type,
values_from = Births
) %>%
rename(FIPS = `State of Residence Code`, State = `State of Residence`) %>%
mutate(low_risk_cesarean_rate = `Low-risk cesarean births`/`All low-risk births`)
df_low_risk_births_by_state
## # A tibble: 306 × 6
## Year State FIPS `All low-risk births` Low-risk ces…¹ low_r…²
## <int> <chr> <int> <int> <int> <dbl>
## 1 2016 Alabama 1 19029 5318 0.279
## 2 2016 Alaska 2 3437 656 0.191
## 3 2016 Arizona 4 25824 5591 0.217
## 4 2016 Arkansas 5 11788 2944 0.250
## 5 2016 California 6 162397 40487 0.249
## 6 2016 Colorado 8 22603 4613 0.204
## 7 2016 Connecticut 9 12425 3623 0.292
## 8 2016 Delaware 10 3627 862 0.238
## 9 2016 District of Columbia 11 3882 1051 0.271
## 10 2016 Florida 12 76403 23932 0.313
## # … with 296 more rows, and abbreviated variable names
## # ¹`Low-risk cesarean births`, ²low_risk_cesarean_rate
This processed dataframe is saved as a csv in /publish/ and RDS in /save/ with the filename “state_low_risk_births_2016_to_2021”.
saveRDS(df_low_risk_births_by_state, here("save","state_low_risk_births_2016_to_2021.RDS"))
write.csv(df_low_risk_births_by_state, here("publish", "state_low_risk_births_2016_to_2021.csv"), row.names = FALSE)
CDC offers an API for US-level data from CDC Wonder:
https://wonder.cdc.gov/wonder/help/WONDER-API.html
We did not use this service for this project since the API is not setup to provide state-level data (or any granular geographic data). However, we did learn about the wonderapi R package for querying the CDC Wonder API using R. At the time of this write-up (3/6/2023) the wonderapi package had a branch for adding support to query CDC’s Expanded Natality dataset.
Additional sources for CDC vital statistics data include: