Writing of `DateTime` columns #34

jkrumbiegel · 2024-01-04T09:15:52Z

This is a draft PR how writing of DateTime columns could in principle be achieved. There are still many questions to be answered, however.

The basic idea is to convert DateTime columns to double precision numbers. The epoch and delta used are already defined in the package for reading, so they can be reused. However, I've noticed that reading in a dataframe with sub-second precision, I do not recover this data for .xpt, e.g., because the reading code rounds to the delta which is Second. I don't think this is necessary so maybe this qualifies as a bug in ReadStatTables:

julia> df = DataFrame(time = now())
1×1 DataFrame
 Row │ time                    
     │ DateTime                
─────┼─────────────────────────
   1 │ 2024-01-04T10:00:15.023

julia> writestat("some_datetime.xpt", df)
1×1 ReadStatTable:
 Row │      time 
     │   Float64 
─────┼───────────
   1 │ 2.01998e9

julia> readstat("some_datetime.xpt")
1×1 ReadStatTable:
 Row │                time 
     │            DateTime 
─────┼─────────────────────
   1 │ 2024-01-04T10:00:15

Regarding the appropriate datetime formats, so far I've only taken a brief look at SAS in this context https://documentation.sas.com/doc/en/vdmmlcdc/8.1/leforinforref/n0av4h8lmnktm4n1i33et4wyz5yy.htm. The current code defines the formats below, but the SAS docs say that DATETIMEw.d is a dynamic format where w can have any width value from 7 to 40 and d can be any number of digits after the comma from 0 to 39. So I'm not sure why the below selection is as it is, as all these values seem to specify no decimals after the comma (so no milliseconds). The current reading code also doesn't seem to honor these values at all, it just does the epoch/delta conversion the same way, regardless. I can imagine that if we don't write out values that are correct given the format we specify, we create files that are invalid in SAS.

This is the mentioned list of formats currently found in the code base:

const sas_datetime_formats = [
    "DATETIME", "DATETIME18", "DATETIME19",  "DATETIME20", "DATETIME21", "DATETIME22", "TOD"
]

The way that I'm changing the column representation from DateTime to Float64 is also very hacky currently and just serves as a proof of concept. I'm not sure what the most appropriate path would be given the package's design, maybe the conversion should only happen later in the value writers (which also have comment stubs mentioning datetime support).

codecov · 2024-01-04T09:19:56Z

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (073d6b0) 99.91% compared to head (ee24b02) 77.12%.

Files	Patch %	Lines
src/writestat.jl	0.00%	8 Missing ⚠️
src/datetime.jl	0.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #34       +/-   ##
===========================================
- Coverage   99.91%   77.12%   -22.80%     
===========================================
  Files          11       11               
  Lines        1187     1084      -103     
===========================================
- Hits         1186      836      -350     
- Misses          1      248      +247

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

junyuan-chen · 2024-01-04T18:41:48Z

Thank you for your attempt!

I once considered doing something based on MappedArrays that only takes the conversion lazily. The implementation will also integrates with the relevant column metadata field. I do wish to implement such things at some moment and will get back to it.

jkrumbiegel · 2024-01-08T10:12:32Z

Thank you, let me know here if you want eyes on some approach you're trying out.

andreasnoack · 2024-01-31T11:28:45Z

@junyuan-chen would it make sense to move forward with the approach taken in this PR as an interim solution until you have the time implement the MappedArrays based approach? We are very intereseting in using and constributing to this package as users are currently writing results to CSV and creating the xpt file from R because of this limitation.

junyuan-chen · 2024-01-31T17:25:34Z

@junyuan-chen would it make sense to move forward with the approach taken in this PR as an interim solution until you have the time implement the MappedArrays based approach? We are very intereseting in using and constributing to this package as users are currently writing results to CSV and creating the xpt file from R because of this limitation.

Thank you for raising this point. I will try to see whether something with a minimum amount of changes could serve as an interim solution. I plan to spend some time in coming months (probably late March) to get a more comprehensive revision to the package design to have the write support works better, which will result in a v0.3.0 release.

andreasnoack · 2024-03-22T07:16:47Z

@junyuan-chen just a firendly bump. Have you had any time to look into this?

junyuan-chen · 2024-04-01T00:26:09Z

The write support has been implemented in #36.

jkrumbiegel · 2024-04-02T09:10:05Z

Thank you for your work @junyuan-chen!

enable a basic version of datetime writing

ee24b02

junyuan-chen closed this Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing of `DateTime` columns #34

Writing of `DateTime` columns #34

jkrumbiegel commented Jan 4, 2024

codecov bot commented Jan 4, 2024

junyuan-chen commented Jan 4, 2024 •

edited

Loading

jkrumbiegel commented Jan 8, 2024

andreasnoack commented Jan 31, 2024

junyuan-chen commented Jan 31, 2024

andreasnoack commented Mar 22, 2024

junyuan-chen commented Apr 1, 2024

jkrumbiegel commented Apr 2, 2024

Writing of DateTime columns #34

Writing of DateTime columns #34

Conversation

jkrumbiegel commented Jan 4, 2024

codecov bot commented Jan 4, 2024

Codecov Report

junyuan-chen commented Jan 4, 2024 • edited Loading

jkrumbiegel commented Jan 8, 2024

andreasnoack commented Jan 31, 2024

junyuan-chen commented Jan 31, 2024

andreasnoack commented Mar 22, 2024

junyuan-chen commented Apr 1, 2024

jkrumbiegel commented Apr 2, 2024

Writing of `DateTime` columns #34

Writing of `DateTime` columns #34

junyuan-chen commented Jan 4, 2024 •

edited

Loading