-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progress bar/indicator for "by" operations #6228
Conversation
Generated via commit 78807e9 Download link for the artifact containing the test results: ↓ atime-results.zip Time taken to finish the standard R installation steps: 12 minutes and 23 seconds Time taken to run |
This comment was marked as outdated.
This comment was marked as outdated.
please add an argument to showProgress=getOption("datatable.showProgress", interactive()), Also please add docs similar to ?fread
|
New argument added, allows us to do:
R CMD check should pass now.
Edit: Apparently this also happens with current master... not sure if this is known:
library(devtools)
load_all()
dt = data.table(a = 1:1000000)
dt[, 1, by = a]
@tdhock @Anirban166 have you seen this before? Should I file an issue? After doing a quick search on some forums, apparently Valgrind often doesn't playtoo well with openMP, this is most likely a false positive from Valgrind: |
Co-authored-by: Michael Chirico <[email protected]>
I did some benchmarking https://tdhock.github.io/blog/2024/atime-showProgress/ and it looks like the proposed code with new showProgress arg has the same performance as previous master, at least for small data sizes / compute times. (<0.1 sec). |
Great! Really cool article! I included some benchmarks for some large (>2s) operations in the PR description above and it seems that on average the new changes aren't that much slower than current master (at least not by magnitudes). I think performance is nearly identical for small (<1s) operations because a built in feature of the progress indicator is that it doesn't print unless the operation is longer than 1 second, similar to fwrite (3s) When I log on tomorrow I'll do some more comprehensive benchmarking with some of the steps you've outlined in your article! |
I wonder if we easily can make that time delay (1 or 3 seconds) the same and user controlled? If not in thie PR then please create another issue. |
For longer by operations to benchmark before and after showProgress, https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping |
An option to select how long the function to wait before progress printing should be easy to add, it is simply a check. (This means yes, I can change this to match |
Also @tdhock and anyone that's interested in this feature, there currently isn't support for any Would we want progress printing for On second note: I took a quick glance at the implementations for the |
I also don't see a particular need to implement for GForce, besides consistency. With how GForce actually operates, it's ~roughly equivalent to just doing an un-grouped operation, for which we don't offer any progress bar. So in that sense, offering a progress bar in this case would then be inconsistent with the lack of bar for any So, I'm happy to at very least leave this as out of scope for this issue & wait if such request will be made later. |
Ok, then I believe this PR should be ready to go, are there any preferences to having a 1 second vs 3 second wait for progression printing? Otherwise, I can add a news (does this count as a new feature or a note?) entry, write better documentation Although I would prefer that we merge #6296 first, as some overlapping variables between the two now that we no longer want to rely on |
I would prefer consistent 3 second wait. |
I agree on the need for consistency, and a preference to share code between the implementations. One thing I immediately see is the current implementation uses |
Rather than deferring code sharing to a follow up issue, I would rather get it right here. |
Ok 👍
|
There's definitely a world where all of the code can use Line 957 in 0030b15
|
ok great thanks for the analysis about fread/fwrite, so I guess we create a follow-up issue about sharing progress() between all three. please add a NEWS item. |
great thanks |
Closes #3060
Similar to
fwrite
progress printing, triggers when ETA >= 3s, prints the following, updating every second until complete.progress()
function to show groups done, groups remaining, time elapsedAdd to gforce operations?wallclock()
are quite expensive) to measure overheadBenchmarking
With Progress Printing
Current Master
Performance analysis using atime: