Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make output column capitalization consistent with ITIS API for functions using pick_cols() #25

Open
sformel-usgs opened this issue Feb 16, 2023 · 3 comments
Labels

Comments

@sformel-usgs
Copy link

This is small potatoes, but I thought it couldn't hurt to mention. And mostly I'm curious about why this is happening.

I had reason to mix search_scientific and scientific_name for a large list of taxa and wanted to join the results on combinedName. But search_scientific returns a column named combinedName, while scientific_name returns a field called combinedname (lowercase n). After closer inspection, the unitname fields are also inconsistently capitalized. So the request is that these functions return the same names so the tables can be easily joined.

But I couldn't find where the names with the lowercase n were being defined. The scientific_name function explicitly names combinedName. The capital N seems to be consistent with what the ITIS API returns and what the full record function returns (see below).

I reinstalled from github, in case I had overlooked some commit, but the mystery remains. Is this by design, or am I overlooking something simple?

#reprex

>ritis::search_scientific(x = "Spartina alterniflora")

# A tibble: 3 × 12
  author                     combinedName                  kingdom tsn   unitI…¹ unitI…² unitI…³ unitI…⁴ unitN…⁵ unitN…⁶ unitN…⁷ unitN…⁸
  <chr>                      <chr>                         <chr>   <chr> <lgl>   <lgl>   <chr>   <lgl>   <chr>   <chr>   <chr>   <lgl>  
1 Loisel.                    Spartina alterniflora         Plantae 41267 NA      NA      NA      NA      SpartialternNA      NA     
2 (Muhl. ex Elliott) Fernald Spartina alterniflora var. gPlantae 5408NA      NA      var.    NA      Spartialternglabra  NA     
3 (Merr.) Fernald            Spartina alterniflora var. pPlantae 5408NA      NA      var.    NA      Spartialternpilosa  NA     
# … with abbreviated variable names ¹​unitInd1, ²​unitInd2, ³​unitInd3, ⁴​unitInd4, ⁵​unitName1, ⁶​unitName2, ⁷​unitName3, ⁸​unitName4

>ritis::scientific_name(tsn = 41267)

# A tibble: 1 × 4
  combinedname          tsn   unitname1 unitname2   
  <chr>                 <chr> <chr>     <chr>       
1 Spartina alterniflora 41267 Spartina  alterniflora

> ritis::full_record(tsn = 41267)[['scientificName']]
$author
[1] "Loisel."

$class
[1] "gov.usgs.itis.itis_service.data.SvcScientificName"

$combinedName
[1] "Spartina alterniflora"

$kingdom
NULL

$tsn
[1] "41267"

$unitInd1
NULL

$unitInd2
NULL

$unitInd3
NULL

$unitInd4
NULL

$unitName1
[1] "Spartina                           "

$unitName2
[1] "alterniflora"

$unitName3
NULL

$unitName4
NULL
Session Info ```r

sessionInfo()

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ritis_1.0.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 fansi_1.0.4 utf8_1.2.3 dplyr_1.1.0 plyr_1.8.8 crul_1.3 R6_2.5.1
[8] jsonlite_1.8.4 lifecycle_1.0.3 magrittr_2.0.3 pillar_1.8.1 rlang_1.0.6 cli_3.6.0 curl_5.0.0
[15] rstudioapi_0.14 xml2_1.3.3 vctrs_0.5.2 generics_0.1.3 urltools_1.7.3 tools_4.2.1 glue_1.6.2
[22] triebeard_0.3.0 compiler_4.2.1 pkgconfig_2.0.3 tidyselect_1.2.0 tibble_3.1.8 solrium_1.2.0 httpcode_0.3.0

</details>
@maelle
Copy link
Member

maelle commented Mar 2, 2023

@jcblum

@jcblum jcblum added the feature label Dec 1, 2023
@jcblum
Copy link
Contributor

jcblum commented Dec 1, 2023

Hi @sformel-usgs, thanks for pointing this out!

Perhaps you figured this out on your own already 😅, but the reason that scientific_name() is returning lowercase column names is because it uses the internal pick_cols() generic function (specifically, its .data.frame method) to pull out a subset of columns from what ITIS returns. See: https://github.com/ropensci/ritis/blob/95f190cd328c80e75aa9619f04db7cd1016f7e9c/R/zzz.R#L75C1-L82

pick_cols() is used all over the codebase, so I'm guessing that the original motivation behind lowercasing was to make this utility as flexible as possible.

I agree that it would be nicer for the field names to have consistent capitalization no matter which path you took to getting the information. As you point out, that makes it easier to join outputs from different functions. I can also see the merit in making the capitalization consistent with the ITIS API.

However, since pick_cols() is such a widespread piece of infrastructure in this package, rethinking its behavior is a bit of an undertaking. I'm tagging this as a feature request, but I likely won't be able to get to it super soon (contributions welcome!).

@jcblum jcblum changed the title inconsistent capitalization in column names returned by scientific_name and search_scientific Make output column capitalization consistent with ITIS API for functions using pick_cols() Dec 1, 2023
@sformel-usgs
Copy link
Author

Excellent detective work! I did not figure this out yet; I appreciate you following up on it. I don't have time to work on this in the next couple months, but I'll put it in the queue for the new year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants