Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing of DateTime columns #34

Closed

Conversation

jkrumbiegel
Copy link
Contributor

cf. #32

This is a draft PR how writing of DateTime columns could in principle be achieved. There are still many questions to be answered, however.

The basic idea is to convert DateTime columns to double precision numbers. The epoch and delta used are already defined in the package for reading, so they can be reused. However, I've noticed that reading in a dataframe with sub-second precision, I do not recover this data for .xpt, e.g., because the reading code rounds to the delta which is Second. I don't think this is necessary so maybe this qualifies as a bug in ReadStatTables:

julia> df = DataFrame(time = now())
1×1 DataFrame
 Row │ time                    
     │ DateTime                
─────┼─────────────────────────
   12024-01-04T10:00:15.023

julia> writestat("some_datetime.xpt", df)
1×1 ReadStatTable:
 Row │      time 
     │   Float64 
─────┼───────────
   12.01998e9

julia> readstat("some_datetime.xpt")
1×1 ReadStatTable:
 Row │                time 
     │            DateTime 
─────┼─────────────────────
   12024-01-04T10:00:15

Regarding the appropriate datetime formats, so far I've only taken a brief look at SAS in this context https://documentation.sas.com/doc/en/vdmmlcdc/8.1/leforinforref/n0av4h8lmnktm4n1i33et4wyz5yy.htm. The current code defines the formats below, but the SAS docs say that DATETIMEw.d is a dynamic format where w can have any width value from 7 to 40 and d can be any number of digits after the comma from 0 to 39. So I'm not sure why the below selection is as it is, as all these values seem to specify no decimals after the comma (so no milliseconds). The current reading code also doesn't seem to honor these values at all, it just does the epoch/delta conversion the same way, regardless. I can imagine that if we don't write out values that are correct given the format we specify, we create files that are invalid in SAS.

This is the mentioned list of formats currently found in the code base:

const sas_datetime_formats = [
    "DATETIME", "DATETIME18", "DATETIME19",  "DATETIME20", "DATETIME21", "DATETIME22", "TOD"
]

The way that I'm changing the column representation from DateTime to Float64 is also very hacky currently and just serves as a proof of concept. I'm not sure what the most appropriate path would be given the package's design, maybe the conversion should only happen later in the value writers (which also have comment stubs mentioning datetime support).

Copy link

codecov bot commented Jan 4, 2024

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (073d6b0) 99.91% compared to head (ee24b02) 77.12%.

Files Patch % Lines
src/writestat.jl 0.00% 8 Missing ⚠️
src/datetime.jl 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #34       +/-   ##
===========================================
- Coverage   99.91%   77.12%   -22.80%     
===========================================
  Files          11       11               
  Lines        1187     1084      -103     
===========================================
- Hits         1186      836      -350     
- Misses          1      248      +247     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@junyuan-chen
Copy link
Owner

junyuan-chen commented Jan 4, 2024

Thank you for your attempt!

I once considered doing something based on MappedArrays that only takes the conversion lazily. The implementation will also integrates with the relevant column metadata field. I do wish to implement such things at some moment and will get back to it.

@jkrumbiegel
Copy link
Contributor Author

Thank you, let me know here if you want eyes on some approach you're trying out.

@andreasnoack
Copy link
Contributor

@junyuan-chen would it make sense to move forward with the approach taken in this PR as an interim solution until you have the time implement the MappedArrays based approach? We are very intereseting in using and constributing to this package as users are currently writing results to CSV and creating the xpt file from R because of this limitation.

@junyuan-chen
Copy link
Owner

@junyuan-chen would it make sense to move forward with the approach taken in this PR as an interim solution until you have the time implement the MappedArrays based approach? We are very intereseting in using and constributing to this package as users are currently writing results to CSV and creating the xpt file from R because of this limitation.

Thank you for raising this point. I will try to see whether something with a minimum amount of changes could serve as an interim solution. I plan to spend some time in coming months (probably late March) to get a more comprehensive revision to the package design to have the write support works better, which will result in a v0.3.0 release.

@andreasnoack
Copy link
Contributor

@junyuan-chen just a firendly bump. Have you had any time to look into this?

@junyuan-chen
Copy link
Owner

The write support has been implemented in #36.

@jkrumbiegel
Copy link
Contributor Author

Thank you for your work @junyuan-chen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants