Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange rounding errors with signif and write_csv #1502

Open
MarekGierlinski opened this issue Jul 26, 2023 · 6 comments
Open

Strange rounding errors with signif and write_csv #1502

MarekGierlinski opened this issue Jul 26, 2023 · 6 comments
Labels
bug an unexpected problem or unintended behavior write ✏️

Comments

@MarekGierlinski
Copy link

MarekGierlinski commented Jul 26, 2023

I noticed that certain numbers cause write_csv, write_tsv, etc. to produce strange rounding errors. Here is an example:

library(readr)

df <- data.frame(x = 1.406148e-20)
df$rounded = signif(df$x, 4)

write_csv(df, "test1.csv")
write.csv(df, "test2.csv", row.names = FALSE)

The content of test1.csv file is

x,rounded
1.406148e-20,1.4060000000000002e-20

The problem seems to be only with readr. The test2.csv file is created as expected:

"x","rounded"
1.406148e-20,1.406e-20

I wonder if this is reproducible on other systems.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] readr_2.1.4
@MarekGierlinski

This comment was marked as outdated.

@hadley

This comment was marked as outdated.

@hadley hadley added the reprex needs a minimal reproducible example label Aug 1, 2023
@MarekGierlinski

This comment was marked as outdated.

@hadley
Copy link
Member

hadley commented Aug 2, 2023

Thanks! Here's a somewhat more minimal reprex:

library(readr)

df <- data.frame(x = 1.406148e-20)
df$rounded <- signif(df$x, 4)
cat(format_csv(df))
#> x,rounded
#> 1.4061479999999998e-20,1.4060000000000002e-20

Created on 2023-08-02 with reprex v2.0.2

FWIW these numbers aren't invented, they're just normally not printed:

sprintf("%.20e", df$rounded)
#> [1] "1.40600000000000021268e-20"

@hadley hadley added write ✏️ and removed reprex needs a minimal reproducible example labels Aug 2, 2023
@MarekGierlinski
Copy link
Author

MarekGierlinski commented Aug 2, 2023

Oh, I see. We are hitting the limits of binary representation of a number? Nothing to do with readr or signif?

sprintf("%.50e", 1.2)
#> [1] "1.19999999999999995559107901499373838305473327636719e+00"

Still, would be nice to have numbers truncated nicely, just as in the default write.csv.

@hadley hadley added the bug an unexpected problem or unintended behavior label Aug 2, 2023
@hadley
Copy link
Member

hadley commented Aug 2, 2023

Oh yeah, it's definitely a bug, it will just require some thought to fix because we need to apply some (probably well known) heuristic to avoid accidentally removing decimal places that are important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior write ✏️
Projects
None yet
Development

No branches or pull requests

2 participants