Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test project tests fail on Linux machines #359

Open
nip5 opened this issue Mar 7, 2019 · 1 comment
Open

Test project tests fail on Linux machines #359

nip5 opened this issue Mar 7, 2019 · 1 comment
Assignees
Labels
Milestone

Comments

@nip5
Copy link
Contributor

nip5 commented Mar 7, 2019

Test project tests fail on Linux machines due to "NA" being coerced to NA in Linux and thus site 4 is skipped on database generation.

@nip5 nip5 assigned CaitlinA and nip5 Mar 7, 2019
@nip5 nip5 added the bug label Mar 7, 2019
@nip5 nip5 added this to the Code testing milestone Mar 7, 2019
@dschlaep
Copy link
Member

@CaitlinA @nip5

Did PR #360 close this issue?

I don't think that this issue description is helpful, e.g., there is no reproducible example; there is no example output of the error message, it doesn't state which database (out of >= 3), etc.

Also claiming that this is an issue for (all) Linux machines is not helpful; travis-ci which we run with Ubuntu 14.04.5 LTS did not produce this issue (https://travis-ci.org/DrylandEcology/rSFSW2/builds/502228791 passing the previous PR for rSFW2 v3.1.3): i.e.,

[1] "rSFSW2's 'make_dbOutput': started at 2019-03-05 20:38:51"
[1] "rSFSW2's 'make_dbOutput': ended after 0.75 s"

Maybe it fails on your specific Linux machine, but then writing what setup you use would be more helpful for debugging. As it is, I see no specific reason why this was an issue with the R code of rSFSW2 and not with your specific setup.

Commit bdc6062 introduces new code to replace any NA (which carry a specific meaning in R) in weather folder names with a text string (of no specific meaning). I find this confusing without further explanation.

We used to remove NAs in weather folder names with na.exclude() (

temp <- unique(stats::na.exclude(
). Contrary to that old code snippet, it appears that when I changed our test project about 10 months ago, I changed how we re-write the InputMaster csv, i.e., we started to write "NA" instead of NA to the csv file (db3948e). Thus, the reference output database contains an entry for site 4 "NA". I guess that maybe my local computer and travis-ci, translate this into "NA" while yours reads it as NA from disk? The "NA" has ended up in the reference output database because it was no longer removed by na.exclude. Thus, if this is the problem, then I guess that instead of laboriously rewriting the NAs into "NA"s, simply removing the na.exclude (maybe with an additional as.character(SFSW2_prj_inputs[["SWRunInformation"]]$WeatherFolder) if needed) would produce the same result as the reference output database, but much easier and faster?

Since, originally these NAs indicated non-included sites, we may want to consider whether we prefer eventually to revert to the old behavior, i.e., not include those sites in the weatherfolders table?

Either way, looping over vectors tends to be slow and in our use cases site_data could have many rows; however, the code is executed only once and thus it does not matter much for the overall performance of a simulation run. Still, you may want to read up on vectorized code (e.g., https://datascienceplus.com/strategies-to-speedup-r-code/), i.e., the code in the commit

  for (i in seq(sites_data$WeatherFolder)){
    if (is.na(sites_data$WeatherFolder[i])){
      sites_data$WeatherFolder[i] <- "NA"
    }
  }

is equivalent to

sites_data$WeatherFolder[is.na(sites_data$WeatherFolder)] <- "NA"

Here, an example with 100,000 sites -- a reasonable setup for one of our simulation experiments: the for-loop is (on my machine) over 1,300 times slower...

sites_data <- data.frame(WeatherFolder = as.character(sample(1e5)),
  stringsAsFactors = FALSE)
sites_data$WeatherFolder[c(10, 50000)] <- NA

f1 <- function(x) {
  for (i in seq(x$WeatherFolder)){
    if (is.na(x$WeatherFolder[i])){
      x$WeatherFolder[i] <- "NA"
    }
  }
  x
}

f2 <- function(x) {
  x$WeatherFolder[is.na(x$WeatherFolder)] <- "NA"
  x
}

identical(f1(sites_data), f2(sites_data))

microbenchmark::microbenchmark(f1(sites_data), f2(sites_data))

But again, simply removing na.exclude from the call in line 1843 should have done the trick as well, i.e.,

      temp <- unique(SFSW2_prj_inputs[["SWRunInformation"]]$WeatherFolder)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants