write_csv adds 15 decimal digits to numbers obtained by subtracting from 1 #1516

sralchemab · 2023-09-13T16:42:37Z

Hi there.

I'm puzzled with an issue when writing some values using write_csv. If you take a look at the example below, variables first to fifth, which were obtained by subtracting from 1, when written to a CSV file using write_csv, they look like:

# test_write_csv.csv
first_value,second_value,third_value,fourth_value,fifth_value,sixth_value,seventh_value
0.050000000000000044,0.09999999999999998,0.9999999999999998,0.09999999999999998,0.9999999999999998,3.1,310

When subtracting from a number other than 1, it's being written fine (see sixth and seventh variables).

Note 1: All of the variables are numeric.
Note 2: if you pay attention to variable fifth, you will notice that every value has a decimal digit, and still fails.

Here's the code to reproduce the issue:

(first_value <- 1-0.95)
#> [1] 0.05
(second_value <- 1-0.9)
#> [1] 0.1
(third_value <- (1-0.9)*10)
#> [1] 1
(fourth_value <- 1.0-0.9)
#> [1] 0.1
(fifth_value <- (1.0-0.9)*10)
#> [1] 1
(sixth_value <- 4-0.9)
#> [1] 3.1
(seventh_value <- (4-0.9)*100)
#> [1] 310
(df <- data.frame(
    "first_value" = first_value,
    "second_value" = second_value,
    "third_value" = third_value,
    "fourth_value" = fourth_value,
    "fifth_value" = fifth_value,
    "sixth_value" = sixth_value,
    "seventh_value" = seventh_value
))
#>   first_value second_value third_value fourth_value fifth_value sixth_value
#> 1        0.05          0.1           1          0.1           1         3.1
#>   seventh_value
#> 1           310
readr::write_csv(df, "test_write_csv.csv")

Below is the sessionInfo() output from a laptop with MacOS, although I have tried it as well from a similar install on Debian:

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin20.0.0 (64-bit)
Running under: macOS Monterey 12.6.6

Matrix products: default
BLAS/LAPACK: /Users/santiagorevale/miniconda3/envs/rcore/lib/libopenblas.0.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] readr_2.1.4

loaded via a namespace (and not attached):
 [1] compiler_4.2.3  magrittr_2.0.3  R6_2.5.1        cli_3.6.1
 [5] hms_1.1.3       tools_4.2.3     pillar_1.9.0    glue_1.6.2
 [9] tibble_3.2.1    utf8_1.2.3      fansi_1.0.4     vctrs_0.6.3
[13] tzdb_0.4.0      lifecycle_1.0.3 pkgconfig_2.0.3 rlang_1.1.1

I will really appreciate any feedback on this matter.

Best,
Santiago

The text was updated successfully, but these errors were encountered:

joranE · 2023-09-14T03:18:03Z

This is merely an artifact of floating point arithmetic as implemented on all computers and is not specific to R. Not all values can be represented exactly in floating point arithmetic.

Run options(digits = 17) and then df[[1]] and you'll see that the actual value is the long version being written by readr, and so that is actually the correct value.

sralchemab · 2023-09-14T09:26:07Z

Hi @joranE.

If you do options(digits = 17) even the sixth variable has a value (3.1000000000000001) although it's not being written as such. How is the function deciding where to place the digits cutoff? I couldn't find a way to play around with it to avoid writing the values like this. How would you do it?

Other analog functions I work with, like utils::write.csv or data.table::fwrite, and even your own readr::write_csv2 produce the output I would expect:

readr::write_csv(df, "test_write_csv.csv")
# first_value,second_value,third_value,fourth_value,fifth_value,sixth_value,seventh_value
# 0.050000000000000044,0.09999999999999998,0.9999999999999998,0.09999999999999998,0.9999999999999998,3.1,310

readr::write_csv2(df, "test_write_csv2.csv")
# first_value;second_value;third_value;fourth_value;fifth_value;sixth_value;seventh_value
# 0,05;0,1;1;0,1;1;3,1;310

utils::write.csv(df, "data.utils.csv", row.names = FALSE)
# "first_value","second_value","third_value","fourth_value","fifth_value","sixth_value","seventh_value"
# 0.05,0.1,1,0.1,1,3.1,310

data.table::fwrite(df, "data.fwrite.csv")
# first_value,second_value,third_value,fourth_value,fifth_value,sixth_value,seventh_value
# 0.05,0.1,1,0.1,1,3.1,310

On a different note, I noticed another odd bahaviour. I created a column where I wrote in each row a number made of increasing number of digits (up to 20 digits after the decimal separator) of the following number 0.23472354234923784023.

df2 <- data.frame("numbers" = c(0.2, 0.23, 0.234, 0.2347, 0.23472, 0.234723, 0.2347235, 0.23472354, 0.234723542, 0.2347235423, 0.23472354234, 0.234723542349, 0.2347235423492, 0.23472354234923, 0.234723542349237, 0.2347235423492378, 0.23472354234923784, 0.234723542349237840, 0.2347235423492378402, 0.23472354234923784023))

If you take a look at how rows 15 to 20 are being written, you'll see the following, showing an odd behaviour between 17 and the subsequent values:

readr::write_csv(df2, "data2.readr.csv")
# 15 0.234723542349237
# 16 0.2347235423492378
# 17 0.23472354234923784
# 18 0.2347235423492378
# 19 0.2347235423492378
# 20 0.2347235423492378

data.table::fwrite(df2, "data2.fwrite.csv")
# ...
# 15 0.234723542349237
# 16 0.234723542349238
# 17 0.234723542349238
# 18 0.234723542349238
# 19 0.234723542349238
# 20 0.234723542349238

Finally, I read the documentation about readr::write_csv and it says that it's analogous to write.csv with some improvements on performance. But if we get a different outcome, is it actually analogous?

Sorry for the lengthy reply. And thanks for the feedback.

joranE · 2023-09-15T01:15:22Z

First, I think you have mistaken me for an author of this package, which I am not, nor am I even a contributor. I'm sure one of the authors will weigh in eventually.

In general, due to the nuances involved in floating point arithmetic, if you want complete control over the decimal precision of data written out to a file, you will need to use something like format() to enforce your digits requirement first and then write the results to file.

Finally, readr::write_csv is a function for writing data to csv's, just like utils::write.csv, they have similar arguments and in the vast, vast number of cases they perform essentially identically. If that doesn't qualify it for the term "analogous" I would suggest perhaps you're being a tad picky.

cjyetman · 2024-02-07T11:00:02Z

Also not a maintainer or contributor here, but this is a rather succinct explanation of why the behavior you're seeing has little to do with {readr}...

print(1 - 0.95, digits = 16)
#> [1] 0.05000000000000004
print(0.05, digits = 16)
#> [1] 0.05
1 - 0.95 == 0.05
#> [1] FALSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write_csv adds 15 decimal digits to numbers obtained by subtracting from 1 #1516

write_csv adds 15 decimal digits to numbers obtained by subtracting from 1 #1516

sralchemab commented Sep 13, 2023

joranE commented Sep 14, 2023 •

edited

Loading

sralchemab commented Sep 14, 2023 •

edited

Loading

joranE commented Sep 15, 2023

cjyetman commented Feb 7, 2024

write_csv adds 15 decimal digits to numbers obtained by subtracting from 1 #1516

write_csv adds 15 decimal digits to numbers obtained by subtracting from 1 #1516

Comments

sralchemab commented Sep 13, 2023

joranE commented Sep 14, 2023 • edited Loading

sralchemab commented Sep 14, 2023 • edited Loading

joranE commented Sep 15, 2023

cjyetman commented Feb 7, 2024

joranE commented Sep 14, 2023 •

edited

Loading

sralchemab commented Sep 14, 2023 •

edited

Loading