Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine get_n_obs() and get_n_individuals() into new function n_observations() #237

Open
peterdesmet opened this issue Jul 13, 2023 · 2 comments
Milestone

Comments

@peterdesmet
Copy link
Member

peterdesmet commented Jul 13, 2023

Suggested in camtraptor July 2023 coding sprint

get_n_obs() returns the number of observations per deployment and species (unless species = NULL)

library(camtraptor)
get_n_obs(mica, species = "Anas platyrhynchos")
#> There are 3 deployments without observations: 577b543a-2cf1-4b23-b6d2-cda7e2eac372, 62c200a9-0e03-4495-bcd8-032944f6f5a1 and 7ca633fa-64f8-4cfc-a628-6b0c419056d7
#> # A tibble: 4 × 3
#>   deploymentID                         scientificName         n
#>   <chr>                                <chr>              <int>
#> 1 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Anas platyrhynchos     4
#> 2 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Anas platyrhynchos     0
#> 3 62c200a9-0e03-4495-bcd8-032944f6f5a1 Anas platyrhynchos     0
#> 4 7ca633fa-64f8-4cfc-a628-6b0c419056d7 Anas platyrhynchos     0

Created on 2023-07-13 with reprex v2.0.2

get_n_individuals() returns the number of individuals (count sum) per deployment and species (unless species = NULL)

library(camtraptor)
get_n_individuals(mica, species = "Anas platyrhynchos")
#> There are 3 deployments without observations: 577b543a-2cf1-4b23-b6d2-cda7e2eac372, 62c200a9-0e03-4495-bcd8-032944f6f5a1 and 7ca633fa-64f8-4cfc-a628-6b0c419056d7
#> # A tibble: 4 × 3
#>   deploymentID                         scientificName         n
#>   <chr>                                <chr>              <int>
#> 1 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Anas platyrhynchos    13
#> 2 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Anas platyrhynchos     0
#> 3 62c200a9-0e03-4495-bcd8-032944f6f5a1 Anas platyrhynchos     0
#> 4 7ca633fa-64f8-4cfc-a628-6b0c419056d7 Anas platyrhynchos     0

Created on 2023-07-13 with reprex v2.0.2

We suggest to combine this behaviour into a single function n_observations() that returns count characteristics per deployment and species. Filters are removed from the function, but supported by filters_ functions

n_observations(
  package = NULL,
  group_by = c("deploymentID", "scientificName") # this is the default, but can be changed by the user.
  # The options should probably be limited to this default and "deploymentID",
  # because the function needs to know what table to chose the column from
  # ... removed, see filters
  # species = "all" removed, see filters
  # sex = NULL removed, see filters
  # life_stage = NULL removed, see filters
  # datapkg removed
)

The returned information would be:

deploymentID
scientificName (if part of group_by)
n_events # also useful, number of sequences/events of within the deployment (for this species)
n_observations # same as n of get_n_obs
n_individuals # same as n of get_n_individuals
@damianooldoni
Copy link
Member

damianooldoni commented Jul 24, 2024

Some thoughts:

n_observations()

I was doubting about the new name, n_observations(), as it is against the best code practice we wrote about naming functions: use verbs to name functions whenever possible. Why not get_n_observations()? Because it's too long 😄 So, maybe we should indeed stick to our original plan from the code sprint of 2023 and so using n_observations() as the new flagship function, together with rai() (#238) and n_species() (#243).

Return number of obs/individuals/events in one data.frame

Getting all these information in one shot can be practical. Otherwise, users must run functions get_n_obs() and get_n_idividuals separately. Function get_n_events() doesn't exist, but it should be added if we opt for leaving functions separate. Also, the reasoning of merging information in one data.frame is the result of a logical thematic grouping: observations, RAI (#238) and species (#243). So, I agree on the approach described above. Now, it's important to take into account the consequences of it. See sections below.

Deprecation vs defunct

I don't like making the get_* functions defunct as good practice says: "make functions defunct only after a sufficiently long deprecation period", see ROpenScience guide. I will deprecate them and I will make them defunct while releasing a later 1.x version of the package. Same for RAI related functions,

Filtering

Yes, filtering about sex and life stage will occur before via filter_observations() and so sex and life_stage will not be part of the new function n_observations(). But what about using sex and life_stage in deprecated functions get_n_obs() and get_n_individuals()? Again, making arguments defunct is bad practice. I would return a deprecation warning, but still I would allow the users to use them. A x <- filter_observations(x, sex == ... , life_stage == ...) will run behind the screen.

In my opinion, what I described in this comment is the best way to both advance the package developement and provide a smooth experience to users.

@peterdesmet, @PietrH, @sannegovaert, @jimcasaer, @MartijnUH: any thought?

@MartijnUH
Copy link
Collaborator

MartijnUH commented Dec 6, 2024

Some thoughts:

n_observations()

I don't have a particularly strong opinion on this, but I suggest that the verb "get" is consistently used/ omitted as a prefix in the naming of camtraptor functions.

Return number of obs/individuals/events in one data.frame

I agree, getting all of these outputs at once will ensure efficient coding. In my experience, you're often interested in all of these metrics anyway. You want to contrast n_observations with n_individuals to explore the expected group sizes on top of the number of observations/ individuals recorded. We might want to keep rai and rai_individuals seperated from n_observations and n_individuals though. However, the new camtraptor::get_custom_effort should seamlesly integrate with the proposed n_species, n_observations and rai functions to produce a table:

deploymentID
scientificName (if part of group_by)
effort # monitoring effort
unit # default in days
n_events # also useful, number of sequences/events of within the deployment (for this species)
n_observations # same as n of get_n_obs
n_individuals # same as n of get_n_individuals
rai # same as rai of get_rai
rai_individuals # same as rai of get_rai_individuals

Deprecation vs defunct

Yes, deprecate them and make them defunct in a later version of the package.

Filtering

I guess that's fine too.

In my opinion, what I described in this comment is the best way to both advance the package developement and provide a smooth experience to users.
Some additional suggestions:

  • In addition to filtering all of these metrics on features like sex, lifeStage, etc., it should be possible to also group outcomes by a temporal dimension (i.e. n_observations, n_species, rai, rai_individuals, effort per week, month or year within a desired time period).
  • What about also returning the locationName or locationID column by default? In many applications I summarize over the locations because deployments are stationary (i.e. they remain at their location). On the other hand users can easily add this information by performing a left_join on dplyr::distinct(package$data$deployments, .data$deploymentID, .data$locationID)` Not sure what the best coding practices recommend in this case? @peterdesmet, @PietrH, @sannegovaert, @jimcasaer, @damianooldoni: any thought?

Edited For my proposal for a n_observations() function please check the latest commits to the branch of this issue. Note that I commented out the check_value statements (as this function did not exist in my R environment), so these checks still need to be incorporated. The function might need some other minor tweaks before it is launched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants