Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

World Reference Base for Soil Resources (4th Edition, 2022) #47

Merged
merged 7 commits into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Suggests:
soilDB,
ape,
data.tree
RoxygenNote: 7.2.3
RoxygenNote: 7.3.2
Roxygen: list(markdown = TRUE)
VignetteBuilder: knitr
LazyData: false
46 changes: 34 additions & 12 deletions R/data-documentation.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#'
#' @title Soil Taxonomy Hierarchy
#' Soil Taxonomy Hierarchy
#'
#' @description The first 4 levels of the US Soil Taxonomy hierarchy (soil order, suborder, greatgroup, subgroup), presented as a \code{data.frame} (denormalized) and a \code{list} of unique taxa.
#' The first 4 levels of the US Soil Taxonomy hierarchy (soil order, suborder, greatgroup, subgroup), presented as a \code{data.frame} (denormalized) and a \code{list} of unique taxa.
#' @details Ordered based on the unique letter codes denoting taxa from the 13th edition of the Keys to Soil Taxonomy.
#' @usage data(ST)
#'
Expand All @@ -20,9 +20,9 @@
#'
"ST"

#' @title Family-level Classes for Soil Taxonomy
#' Family-level Classes for Soil Taxonomy
#'
#' @description A database of family-level class names for Soil Taxonomy.
#' A database of family-level class names for Soil Taxonomy.
#'
#' @references
#' Soil Survey Staff. 2014. Keys to Soil Taxonomy, 12th ed. USDA-Natural Resources Conservation Service, Washington, DC.
Expand All @@ -34,9 +34,9 @@
#'
"ST_family_classes"

#' @title Epipedons, Diagnostic Horizons, Characteristics and Features in Soil Taxonomy
#' Epipedons, Diagnostic Horizons, Characteristics and Features in Soil Taxonomy
#'
#' @description A `data.frame` with columns "group", "name", "chapter", "page", "description", "criteria". Currently page numbers and contents are referenced to 12th Edition Keys to Soil Taxonomy and derived from products in the ncss-tech SoilKnowledgeBase repository (https://github.com/ncss-tech/SoilKnowledgeBase).
#' A `data.frame` with columns "group", "name", "chapter", "page", "description", "criteria". Currently page numbers and contents are referenced to 12th Edition Keys to Soil Taxonomy and derived from products in the ncss-tech SoilKnowledgeBase repository (https://github.com/ncss-tech/SoilKnowledgeBase).
#'
#' @references
#' Soil Survey Staff. 2014. Keys to Soil Taxonomy, 12th ed. USDA-Natural Resources Conservation Service, Washington, DC.
Expand All @@ -48,9 +48,9 @@
#'
"ST_features"

#' @title Formative Elements used by Soil Taxonomy
#' Formative Elements used by Soil Taxonomy
#'
#' @description A database of formative elements used by the first 4 levels of US Soil Taxonomy hierarchy (soil order, suborder, greatgroup, subgroup).
#' A database of formative elements used by the first 4 levels of US Soil Taxonomy hierarchy (soil order, suborder, greatgroup, subgroup).
#'
#' @references
#' S. W. Buol and R. C. Graham and P. A. McDaniel and R. J. Southard. Soil Genesis and Classification, 5th edition. Iowa State Press, 2003.
Expand All @@ -61,9 +61,9 @@
#'
"ST_formative_elements"

#' @title Letter Code Lookup Table for Position of Taxa within the Keys to Soil Taxonomy (12th Edition)
#' Letter Code Lookup Table for Position of Taxa within the Keys to Soil Taxonomy (12th Edition)
#'
#' @description A lookup table mapping unique taxonomic Order, Suborder, Great Group and Subgroups to letter codes that denote their logical position within the Keys.
#' A lookup table mapping unique taxonomic Order, Suborder, Great Group and Subgroups to letter codes that denote their logical position within the Keys.
#'
#' @details The lookup table has been corrected to reflect errata that were posted after the print publication of the 12th Edition Keys, as well as typos in the Spanish language edition.
#'
Expand All @@ -81,9 +81,9 @@
#'
"ST_higher_taxa_codes_12th"

#' @title Letter Code Lookup Table for Position of Taxa within the Keys to Soil Taxonomy (13th Edition)
#' Letter Code Lookup Table for Position of Taxa within the Keys to Soil Taxonomy (13th Edition)
#'
#' @description A lookup table mapping unique taxonomic Order, Suborder, Great Group and Subgroups to letter codes that denote their logical position within the Keys.
#' A lookup table mapping unique taxonomic Order, Suborder, Great Group and Subgroups to letter codes that denote their logical position within the Keys.
#'
#' @references
#'
Expand All @@ -95,3 +95,25 @@
#' @keywords datasets
#'
"ST_higher_taxa_codes_13th"

#' World Reference Base for Soil Resources (4th Edition, 2022)
#'
#' A _list_ containing three _data.frame_ elements `"rsg"`, `"pq"`, and `"sq"` providing information on the 'Representative Soil Groups', 'Principal Qualifiers,' and 'Supplementary Qualifiers,' respectively.
#'
#' @details
#'
#' Each element has the column `"code"` which is a number (1-32) referring to the position in the Reference Soil Groups, and the column `"reference_soil_group"` which is the corresponding group name. The `"pq"` and `"sq"` qualifier name columns (`primary_qualifier` and `supplementary_qualifier`) contain individual qualifier terms. Related qualifiers are identified using `qualifier_group` column derived from qualifier names separated with a forward slash `" / "`
#'
#' - The _data.frame_ `"rsg"` has column `"criteria"`, describing the logical criteria for each Reference Soil Group.
#' - The _data.frame_ `"pq"` has qualifier names in column `"principal_qualifier"`
#' - The _data.frame_ `"sq"` has column `"supplementary_qualifier"`.
#'
#' @references
#'
#' IUSS Working Group WRB. 2022. World Reference Base for Soil Resources. International soil classification system for naming soils and creating legends for soil maps. 4th edition. International Union of Soil Sciences (IUSS), Vienna, Austria.
#'
#' @usage data(WRB_4th_2022)
#'
#' @keywords datasets
#'
"WRB_4th_2022"
6 changes: 3 additions & 3 deletions R/higherTaxaCodes.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Decompose taxon letter codes
#'
#' @description Find all codes that logically comprise the specified codes. For instance, code "ABC" ("Anhyturbels") returns "A" ("Gelisols"), "AB" ("Turbels"), "ABC" ("Anhyturbels"). Use in conjunction with a lookup table that maps Order, Suborder, Great Group and Subgroup taxa to their codes (see \code{\link{taxon_code_to_taxon}} and \code{\link{taxon_to_taxon_code}}).
#' Find all codes that logically comprise the specified codes. For instance, code "ABC" ("Anhyturbels") returns "A" ("Gelisols"), "AB" ("Turbels"), "ABC" ("Anhyturbels"). Use in conjunction with a lookup table that maps Order, Suborder, Great Group and Subgroup taxa to their codes (see \code{\link{taxon_code_to_taxon}} and \code{\link{taxon_to_taxon_code}}).
#'
#' @details Accounts for Keys that run out of capital letters (more than 26 subgroups) and use lowercase letters for a unique subdivision within the "fourth character position."
#'
Expand Down Expand Up @@ -49,7 +49,7 @@ decompose_taxon_code <- function(codes) {

#' Get taxon codes of preceding taxa
#'
#' @description Find all codes that logically precede the specified codes. For instance, code "ABC" ("Anhyturbels") returns "AA" ("Histels") "ABA" ("Histoturbels") and "ABB" ("Aquiturbels"). Use in conjunction with a lookup table that maps Order, Suborder, Great Group and Subgroup taxa to their codes (see \code{\link{taxon_code_to_taxon}} and \code{\link{taxon_to_taxon_code}}).
#' Find all codes that logically precede the specified codes. For instance, code "ABC" ("Anhyturbels") returns "AA" ("Histels") "ABA" ("Histoturbels") and "ABB" ("Aquiturbels"). Use in conjunction with a lookup table that maps Order, Suborder, Great Group and Subgroup taxa to their codes (see \code{\link{taxon_code_to_taxon}} and \code{\link{taxon_to_taxon_code}}).
#'
#' @details Accounts for Keys that run out of capital letters (more than 26 subgroups) and use lowercase letters for a unique subdivision within the "fourth character position."
#'
Expand Down Expand Up @@ -187,7 +187,7 @@ taxon_to_taxon_code <- function(taxon) {

#' Determine relative position of taxon within Keys to Soil Taxonomy (Order to Subgroup)
#'
#' @description The relative position of a taxon is `[number of preceding Key steps] + 1`, or `NA` if it does not exist in the lookup table.
#' The relative position of a taxon is `[number of preceding Key steps] + 1`, or `NA` if it does not exist in the lookup table.
#'
#' @param code A character vector of taxon codes to determine the relative position of.
#'
Expand Down
105 changes: 105 additions & 0 deletions data-raw/wrb_4th_2022.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
## code to prepare `WRBs_4th_2022` dataset goes here
library(pdftools)

## SETUP
##
# dir.create("misc/WRB2022")
# download.file("https://wrb.isric.org/files/WRB_fourth_edition_2022-12-18.pdf",
# destfile = "misc/WRB2022/WRB_fourth_edition_2022-12-18.pdf")

## does not work for RSG/qualifiers; tables used in formatting
## can be used for definitions of diagnostics and qualifiers
# x <- pdf_text("misc/WRB2022/WRB_fourth_edition_2022-12-18.pdf")
# x <- unlist(strsplit(x, "\n"))
# ldx <- cumsum(grepl("Key to the Reference Soil Groups", x))
# y <- split(x, ldx)
# data.frame(y[[11]]) |> View()

## nope
# x <- pdf_data("misc/WRB2022/WRB_fourth_edition_2022-12-18.pdf")
# y <- do.call('rbind', x)
#

x <- readLines("misc/WRB2022/WRB_RSG.txt")
x <- gsub("\u003c", "<", gsub("\u003E", ">", gsub("\u2264", "<=", gsub("\u2265", ">=", x))))
n <- grep("^[A-Z]+$", x)
z.names <- x[n]
x <- x[-n]
idx <- grep("^(Soils having|Other soils)", x)
ldx <- rep(FALSE, length(x))
ldx[idx] <- TRUE
xx <- split(x, cumsum(ldx))
z <- lapply(xx, function(y) {
i <- grep("(; (and|or)|\\.|:)$", y) + 1
i <- i[i < length(y)]
l <- rep(FALSE, length(y))
l[i] <- TRUE
sapply(split(y, cumsum(l)), paste0, collapse = " ")
})
names(z) <- z.names

wrb_rsg <- do.call('rbind', lapply(seq(z), function(i) {
data.frame(code = i, reference_soil_group = z.names[i], criteria = z[[z.names[i]]])
}))
rownames(wrb_rsg) <- NULL
# View(wrb_rsg)

x <- readLines("misc/WRB2022/WRB_PQ.txt")
n <- grep("^[A-Z]+$", x)
z.names <- x[n]
x <- x[-n]
idx <- grep("Principal qualifiers", x)
ldx <- rep(FALSE, length(x))
ldx[idx] <- TRUE
xx <- split(x, cumsum(ldx))
z <- lapply(xx, function(y) {
y <- trimws(gsub("([^ ])/ ", "\\1 / ", y))
y[y != "Principal qualifiers"]
})
names(z) <- z.names

wrb_pq <- do.call('rbind', lapply(seq(z), function(i) {
pq <- lapply(strsplit(z[[z.names[i]]], "/"), trimws)
pg <- lapply(seq(pq), function(j) rep(z[[z.names[i]]][j], length(pq[[j]])))
data.frame(code = i,
reference_soil_group = z.names[i],
qualifier_group = unlist(pg),
principal_qualifiers = unlist(pq))
}))
rownames(wrb_pq) <- NULL
# View(wrb_pq)

x <- readLines("misc/WRB2022/WRB_SQ.txt")
n <- grep("^[A-Z]+$", x)
z.names <- x[n]
x <- x[-n]
idx <- grep("Supplementary qualifiers", x)
ldx <- rep(FALSE, length(x))
ldx[idx] <- TRUE
xx <- split(x, cumsum(ldx))
z <- lapply(xx, function(y) {
y <- trimws(gsub("([^ ])/ ", "\\1 / ", y))
y[y != "Supplementary qualifiers"]
})
names(z) <- z.names

wrb_sq <- do.call('rbind', lapply(seq(z), function(i) {
sq <- lapply(strsplit(z[[z.names[i]]], "/"), trimws)
sg <- lapply(seq(sq), function(j) rep(z[[z.names[i]]][j], length(sq[[j]])))
data.frame(code = i,
reference_soil_group = z.names[i],
qualifier_group = unlist(sg),
supplementary_qualifiers = unlist(sq))
}))
rownames(wrb_sq) <- NULL
# View(wrb_sq)

WRB_4th_2022 <- list(
rsg = wrb_rsg,
pq = wrb_pq,
sq = wrb_sq
)

stopifnot(all(sapply(WRB_4th_2022, function(x) max(x$code)) == 32))

usethis::use_data(WRB_4th_2022, overwrite = TRUE)
Binary file added data/WRB_4th_2022.rda
Binary file not shown.
2 changes: 1 addition & 1 deletion man/SoilTaxonomy-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions man/WRB_4th_2022.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion misc/.gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
*.json
subgroups.tgz
.Rproj.user
.Rhistory
*.Rproj
WRB2022/WRB_fourth_edition_2022-12-18.pdf
Loading
Loading