-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing of DateTime
columns
#34
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #34 +/- ##
===========================================
- Coverage 99.91% 77.12% -22.80%
===========================================
Files 11 11
Lines 1187 1084 -103
===========================================
- Hits 1186 836 -350
- Misses 1 248 +247 ☔ View full report in Codecov by Sentry. |
Thank you for your attempt! I once considered doing something based on |
Thank you, let me know here if you want eyes on some approach you're trying out. |
@junyuan-chen would it make sense to move forward with the approach taken in this PR as an interim solution until you have the time implement the |
Thank you for raising this point. I will try to see whether something with a minimum amount of changes could serve as an interim solution. I plan to spend some time in coming months (probably late March) to get a more comprehensive revision to the package design to have the write support works better, which will result in a v0.3.0 release. |
@junyuan-chen just a firendly bump. Have you had any time to look into this? |
The write support has been implemented in #36. |
Thank you for your work @junyuan-chen! |
cf. #32
This is a draft PR how writing of
DateTime
columns could in principle be achieved. There are still many questions to be answered, however.The basic idea is to convert
DateTime
columns to double precision numbers. The epoch and delta used are already defined in the package for reading, so they can be reused. However, I've noticed that reading in a dataframe with sub-second precision, I do not recover this data for.xpt
, e.g., because the reading code rounds to the delta which isSecond
. I don't think this is necessary so maybe this qualifies as a bug in ReadStatTables:Regarding the appropriate datetime formats, so far I've only taken a brief look at SAS in this context https://documentation.sas.com/doc/en/vdmmlcdc/8.1/leforinforref/n0av4h8lmnktm4n1i33et4wyz5yy.htm. The current code defines the formats below, but the SAS docs say that
DATETIMEw.d
is a dynamic format wherew
can have any width value from 7 to 40 andd
can be any number of digits after the comma from 0 to 39. So I'm not sure why the below selection is as it is, as all these values seem to specify no decimals after the comma (so no milliseconds). The current reading code also doesn't seem to honor these values at all, it just does the epoch/delta conversion the same way, regardless. I can imagine that if we don't write out values that are correct given the format we specify, we create files that are invalid in SAS.This is the mentioned list of formats currently found in the code base:
The way that I'm changing the column representation from
DateTime
toFloat64
is also very hacky currently and just serves as a proof of concept. I'm not sure what the most appropriate path would be given the package's design, maybe the conversion should only happen later in the value writers (which also have comment stubs mentioning datetime support).