Skip to content

Conversation

@alexmgns
Copy link
Collaborator

@alexmgns alexmgns commented Sep 3, 2025

Hi @rafapereirabr,

In the current state I have only implemented travel_time_matrix() to use arrow. I had to temporarily disable expanded_travel_time_matrix() to achieve this, which is why tests wont pass.

We also now have to include arrow as part of the little jar. A side effect of this is that the size goes from 0.15mb to 6mb. Just something to notice but not an issue.

@rafapereirabr
Copy link
Member

Hi Alex. I've been able to run a few performance tests. See my reprex and a couple comments / questions below .

Reprex

options(java.parameters = "-Xm20G")

devtools::load_all(".")
library(bench)

path <- system.file("extdata/poa", package = "r5r")
r5r_network <- setup_r5(data_path = path, verbose = FALSE)

# 2) load origin/destination points and set arguments
points <- read.csv(system.file("extdata/poa/poa_hexgrid.csv", package = "r5r"))
points <- rbind(points, points, points, points)
mode <- c("WALK", "TRANSIT")
max_walk_time <- 30   # minutes
max_trip_duration <- 60 # minutes
departure_datetime <- as.POSIXct("13-05-2019 14:00:00",
                                 format = "%d-%m-%Y %H:%M:%S")

bench::system_time(iterations = 1,
  ttm <- travel_time_matrix(r5r_network = r5r_network,
                            origins = points,
                            destinations = points,
                            mode = mode,
                            departure_datetime = departure_datetime,
                            max_walk_time = max_walk_time,
                            max_trip_duration = max_trip_duration,
                            progress = T)
  )

Results

TLDR: the arrow implementation is faster but indeed it's currently not a huge difference.

ps. I think the function bench::system_time() only captures the memory use in R, so the results below might not capture the memory used on the Java side.

#  expression    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
#  java_to_dt    29s  30.6s    0.0332     229MB   0.0774     3     7      1.51m <NULL>
#   arrow df   24.3s  25.2s    0.0389     570MB   0.0519     3     4      1.28m <NULL>
# arrow arrow  25.3s  26.7s    0.0363     569MB   0.0485     3     4      1.38m <NULL>

Comments / Questions

1. returning df or and arrow table

One strange behavior I've found is this. Internally, as it stands in your PR, the travel time matrix function returns a data.frame, so the code on line 243 is this:

travel_times <- arrow::read_ipc_stream(travel_times, as_data_frame = T) 

As a test, I changed it to as_data_frame = FALSE. With this change, we do not materialize the output, and simply return an arrow table. So I was expecting to see a quicker computation time, but the function actually became a bit slower. This is a bit strange, isn't it ?

2. R5 and arrow competing for CPU ?

I don't understand the details of the Java code, but it seems to me that you are streaming the R5 results to arrow in parallel with ArrowR5Process and BatchWithSeq. If that's the case, my concern here is that R5 runs in parallel already. So this means that R5 and Arrow would be in practice competing for CPU, which would degrade performance. Is my understanding correct ? I'm sorry if I may have misunderstood something.

@alexmgns
Copy link
Collaborator Author

alexmgns commented Sep 7, 2025

Hi Rafa,

To be completely honest I am not 100% on the details of how multithreaded code is handled in the background. However, I believe that the two processes are not happening in parallel, the collector only collects all the batches during runtime, but the actual joining happens after all the batches are recieved. This is because collector.join() is only called after the forloop:

            for (ForkJoinTask<?> t : tasks) {
                t.get();
            }

My understanding is that this for loop awaits until all the R5 threads finish their processing. So the collecting only happens once R5 is done.

Although, I think your concert of R5 and arrow clashing could actually be a way to speed up the function further. I don't think its possible to degrade performance if the two processes are running simultaneously, java is smart and the resources will be distributed as needed there won't be clashing. But the collecting can only happen on a single thread, so at the end of the function, once all the batches are processed we have to wait for a single thread to collect all the tables into one, while the other threads are idle.
Perhaps if the batches were being merged in parallel while R5 was still calculating this would reduce the real life time it takes for the calculation, because at the very end we wouldn't have to wait as long for the single thread to finish while the rest sit idle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants