Create helper functions to aid in the annoyance of creating data documentation roxygen2 comments, data dictionaries, codebooks, reports, visualizations, and metadata files.
#' Document Datasets
#'
#' @description
#' Helper function to auto-generate the necessary `roxygen2` documentation for
#' datasets included/exported with an R package.
#'
#' @param data_obj The data object to be documented. Should be a `data.frame` or
#' [tibble::tibble()], or any other object that can be coerced to a `data.frame`
#' or `list`.
#' @param name Name of the dataset. If not provided, the name of the `data_obj`
#' object will be used.
#' @param description Description of the dataset. If not provided, a
#' placeholder will be used.
#' @param source The source of the dataset. If not provided, a
#' placeholder will be used.
#' @param file Path to the file where the documentation will be written. If not provided, the
#' documentation will be written to `R/data.R` by default. If you want to
#' document individual datasets in separate files, you can provide a path to
#' the file where the documentation will be written. The file will be created
#' if it does not exist.
#' @param column_descriptions A named list of column descriptions for the dataset.
#' The names should match the column names of the dataset. If not provided, a
#' placeholder will be used.
#' @param ... Additional arguments not in use, yet.
#'
#' @return Invisibly returns the documentation string.
#'
#' @example examples/ex_document_datasets.R
#'
#' @export
document_data <- function(
data_obj,
name = deparse(substitute(data_obj)),
description = "<Add a description here>",
source = "<Add a source here>",
file = "R/data.R",
column_descriptions = NULL,
...
) {
# validate data_obj and name
if (!exists(deparse(substitute(data_obj)))) {
rlang::abort("The dataset does not exist in the current environment.")
}
if (!is.data.frame(data_obj) && !inherits(data_obj, "tbl_df")) {
rlang::abort("The provided object is not a data frame or tibble object.")
}
dataset_name <- deparse(substitute(x))
data_description <- get_dataset_description(x, dataset_name)
file_name <- paste0("./", dataset_name, ".R")
cat(data_description, file = file_name)
# Coerce the data to a data.frame
dat <- as.data.frame(data_obj)
# Check if the column descriptions are provided
if (!is.null(column_descriptions)) {
if (!is.list(column_descriptions)) {
rlang::abort("Column descriptions must be a named list.")
}
if (length(column_descriptions) != ncol(dat)) {
rlang::abort("Number of column descriptions must match the number of columns in the dataset.")
}
} else {
column_descriptions <- rep("<Add a description here>", ncol(dat))
names(column_descriptions) <- names(dat)
}
# title
title <- paste0("#' @title ", name, "\n")
description <- paste0("#' @description ", description, "\n")
format <- paste0("#' @format A data frame with ", nrow(dat), " rows and ", ncol(dat), " columns.\n")
# Create the documentation string
doc <- paste0(
"#' @title ", title, "\n",
"#' @description ", description, "\n",
"#' @usage data(", name, ")\n",
"#' @format A data frame with ", nrow(dat), " rows and ", ncol(dat), " columns.\n",
"#' @source <Add a source here>\n",
"#' @export\n",
" '", name, "'"
)
# Write the documentation to the file
if (write_to_file) {
cat(doc, file = file, append = TRUE)
}
# Return the documentation string
invisible(doc)
}
Create helper functions to aid in the annoyance of creating data documentation
roxygen2comments, data dictionaries, codebooks, reports, visualizations, and metadata files.document_dataset: given provideddata_obj(i.e.data.frameortibble),description,source,column_namesandcolumn_descriptions, output theroxygen2skeleton for the dataset to anR/data.Rfile: