-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of Arrow within CohortDiagnostics causes issue #43
Comments
Adding more to this issue: This error also occurs.
This is likely with a joing using dplyr here.
The table
Looping over values from casting time_id as int in the sql may fix this, however, I expect that being able to dynamically join to compatible types would be a desired feature as dbdplyr doesn't expect explicit type casting. |
Thanks for reporting this! |
@anthonysena Have you tested FeatureExtraction with the develop branch of Andromeda? |
Not yet @ablack3 - trying to get the unit tests back online and then see how FE is working with the develop branch of Andromeda |
@ablack3 so following up on the CohortDiagnostics Results i was able to resolve the initial issues by casting explicitly to character or integer in SQL when querying to Andromeda, forcing the arrow object to store empty values as the correct type. However, when running with the latest version I get a new strange error. The problem is that there are multiple rows keys in a results object that should be generated from a group by select. (This doesn't happen in the current released version of Andromeda) and I can't figure out why.
The above is a check that is performed when exporting data from the package - this is to ensure that the results data can be inserted into the result schemas for viewing in the shiny app etc. To reproduce:
The offending code is around here. This wasn't written by me (and I find it kind of confusingly written so this might be a tough one to work out) - it may be just an error with the combinations of Andromeda objects. However, the SQL that generates the results looks fine - it's using a GROUP BY clause so the rows in question should be aggregated (and indeed, are in the Sqlite version). Either CohortDiagnostics is doing something bad when combining the objects like this or something is going wrong when exporting the data. I can't quite figure out why that might be, and why it works fine when using the Sqlite version. |
Errors when using the develop branch of CohortDiagnostics occur. I'm not sure if this is Andromeda or the downstream use of FeatureExtraction.
Steps to reproduce:
CohortDiagnostics
(develop or main)devtools::test()
)Note that this occurs in the time series diagnostics as well as other places.
Running the time series diagnostics and breaking at line #441:
Lets you find an error of this type.
Here running
resultsInAndromeda$allData %>% dplyr::collect()
produces something similar.Note that
gender
in the SQL is taken directly fromconcept.concept_name
a varchar/text string by definition.Likely also reproducible when calling
runCohortTimeSeriesDiagnostics
directly.The text was updated successfully, but these errors were encountered: