Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter seem to slow down as graph size increases #65

Open
alexgcrombie opened this issue Oct 4, 2022 · 9 comments
Open

Filter seem to slow down as graph size increases #65

alexgcrombie opened this issue Oct 4, 2022 · 9 comments

Comments

@alexgcrombie
Copy link

While porting the petrol pump example from the FRP book, I noticed that running the just the lifecycle portion (rust equivalent of listing 4.5) of the example that when sending events in debug mode took 1-2s to process a single event, and in production mode this was reduced to 200ms. The code is written to be as close to the Java code in form as I could make it (all of the inputs are modeled as sinks, which then generate the "inputs" interface that the book presents)

In another test application of my own I noticed a x5 slow down when using filter_option (but from my testing the timing is very similar for a plain filter as well), this time going from 200uS to ~1ms in debug mode, and ~27uS to 200uS in production.

One thing to note is that this is being built on Linux using the latest release of Rust.

The code is as follows for the normal case:

pub fn charger_setup( s: &SodiumCtx, charger_events: &Stream<ChargingStatus> ) -> ChargerCtx {
    let state = charger_events.map( | e: &ChargingStatus | -> ChargingStatus { e.clone() } ).hold( ChargingStatus::Disconnected );

    let updates = charger_events.map( | e: &ChargingStatus | -> UiUpdates {
        match e {
            ChargingStatus::Disconnected => UiUpdates::LedAnimation{ led: LedName::Green1, animation: OFF },
            ChargingStatus::Charging => UiUpdates::LedAnimation{ led: LedName::Green1, animation: THROB },
            ChargingStatus::Charged => UiUpdates::LedAnimation{ led: LedName::Green1, animation: SOLID },
        }
    } );

    ChargerCtx {
        state,
        updates,
    }
}

and the filter enabled code:

pub fn charger_setup_filter_option( s: &SodiumCtx, charger_events: &Stream<ChargingStatus> ) -> ChargerCtx {
    let state = charger_events.map( | e: &ChargingStatus | -> ChargingStatus { e.clone() } ).hold( ChargingStatus::Disconnected );

    let updates1 = Stream::filter_option( &charger_events.snapshot( &state, | evt: &ChargingStatus, state_: &ChargingStatus | -> Option<UiUpdates> {
            if evt != state_ {
                match evt {
                    ChargingStatus::Disconnected => Some( UiUpdates::LedAnimation{ led: LedName::Green1, animation: OFF } ),
                    ChargingStatus::Charging => Some( UiUpdates::LedAnimation{ led: LedName::Green1, animation: THROB } ),
                    ChargingStatus::Charged => Some( UiUpdates::LedAnimation{ led: LedName::Green1, animation: SOLID } ),
                }
            } else {
                None
            }
        }
    ) );

    ChargerCtx {
        state,
        updates: updates1,
    }
}

The above code is the only portion using filters, and as such it results in the following timings. For the non-filtered case:

  setup: 552us
  send( charging ): 146us
  send( charged ): 132us
  send( disconnected ): 129us

and for the filter using case:

  setup: 1699us
  send( charging ): 910us
  send( charged ): 915us
  send( disconnected ): 963us

The above are the debug timings, but the ratio between the cases remains similar even in a release build. In the release build the timings are ~23uS (with no filters) and ~160uS (with filters).

@alexgcrombie
Copy link
Author

The timings appear to be similar in a windows build of the lifecycle portion of the petrol pump example.

The main issue is not necessarily the increase in the time taken when using filter, but the fact that in order to maintain a responsive UI processing a single event should ideally be >1ms and definitely >25ms. The code posted above forms a pretty trivial example and is already approaching the 1ms mark (admittedly in debug)

@clinuxrulz
Copy link
Collaborator

I haven't had time to maintain sodium lately. I did read the code you emailed me, and the performance issue does seem to be on the sodium-rust side.
I suggest forking sodium rust and start profiling it internally. I suspect it may be calling collect_cycles() too often (the internal garbage collector for sodium objects). We could comment out collect_cycles or add an early return inside to see if it makes a difference to performance.
Another thing I can think of is all the reference counting is atomic at the moment (Arc). Replacing all the Arc<> with Rc<> could help too.

@alexgcrombie
Copy link
Author

Thank you for taking the time to review this issue, I will give those suggestions a try. I may need some support (if you are willing) as I am pretty new to Rust once I get stuck

@alexgcrombie
Copy link
Author

alexgcrombie commented Oct 10, 2022

collect_cycles is behaving sensibly in that for a simple test that still shows this issue it is only being called 3 times.

It turns out this issue is unrelated to the use of filter but is instead more to do with the number of nodes (in what appears to be a chain). I created a test that strings together a number of maps in the following way:

#[derive(Clone,Copy)]
enum SomeVal {
    Val1,
    Val2,
    Val3,
}

fn generate_nodes( s: &Stream<SomeVal>, count: u32 ) -> Stream<SomeVal> {
    let new_s= s.map( | v: &SomeVal | -> SomeVal { v.clone() } );
    match count {
        0 => new_s,
        _ => generate_nodes( &new_s, count - 1 ),
    }
}

The test results for this show a clearly exponential trend:

Count Node Count Time to send 1 event (us)
0 2 44
1 3 61
2 4 93
3 5 145
4 6 299
5 7 655
6 8 1049
7 9 2300
8 10 2402
9 11 8057
10 12 12567
11 13 18865
12 14 37039
13 15 64412
14 16 195225
15 17 383411
16 18 504266
17 19 982780
18 20 2041202
19 21 3885266
20 22 7953937
21 23 15776052
22 24 32014120

The majority of this time is spend in mark_roots() and more specifically in resetting the reference count of the adjacent nodes:

fn mark_roots(&self) {
    ...
    for root in &old_roots {
         self.reset_ref_count_adj(root);
    }
    ...
}

If there were an index associated with each of the roots in old_roots in the above loop, then the time taken for each iteration would be <Index>^2. Digging further into this, the problem seems to be occurring due to reset_ref_count_adj() never exiting early. The tracing added is as follows:

fn mark_roots(&self) {
    ...
    let mut root_num = 0;
    for root in &old_roots {
        let tt1 = Instant::now();
        let depth = self.reset_ref_count_adj(root);
        let tt2 = Instant::now();
        println!( "mark_root: reset_ref: {},{},{}", root_num, depth, (tt2-tt1).as_micros() );
        root_num = root_num + 1;
    }
    ...
}

and

fn reset_ref_count_adj(&self, s: &GcNode) -> i32 {
        println!( "ref_reset: {},{},{}", s.id, s.name, s.data.visited.get() );
        if s.data.visited.get() {
            return 0;
        }
        s.data.visited.set(true);
        s.data.ref_count_adj.set(0);
        let mut depth = 0;
        s.trace(|t| {
            depth = self.reset_ref_count_adj(t) + 1;
        });
        s.data.visited.set(false);
        return depth;
    }

When running this with generate_nodes( &sink, 4 ), it produces the following pattern:

ref_reset: 0,Stream::new,false
mark_root: reset_ref: 0,0,2
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
mark_root: reset_ref: 1,1,3
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
mark_root: reset_ref: 2,2,7
ref_reset: 3,Stream::map,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
mark_root: reset_ref: 3,3,16
...
ref_reset: 5,Stream::map,false
ref_reset: 4,Stream::map,false
ref_reset: 3,Stream::map,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 3,Stream::map,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 4,Stream::map,false
ref_reset: 3,Stream::map,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 3,Stream::map,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 2,Stream::map,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
ref_reset: 1,Stream::map,false
ref_reset: 0,Stream::new,false
ref_reset: 0,Stream::new,false
mark_root: reset_ref: 5,5,68

From the above it feels like the visited.set(true) and visited.set(false) pair in reset_ref_count_adj() are being incorrectly set but I can't quite convince myself that it is indeed incorrect. Or if it is incorrect then what the correct logic around this should be

@clinuxrulz
Copy link
Collaborator

Thanks for digging into that. If I can find time, I can have a close look. So, it must be a garbage collector bug.

@alexgcrombie
Copy link
Author

It does indeed appear to be the case. It would be much appreciated if you could give a quick description of how the GC algorithm is supposed to work. I am planning on looking into it a bit further in parallel, so would be good to have some guidance with the way it is intended to work

@clinuxrulz
Copy link
Collaborator

clinuxrulz commented Oct 13, 2022

https://pages.cs.wisc.edu/~cymen/misc/interests/Bacon01Concurrent.pdf

The non-concurrent one in that pdf.

If I remember correctly, ref_count_adj is the temporary negative count (represented as positive integer) when detecting cycles. I.E. If ref_count - ref_count_adj == 0, then a cycle is found. Otherwise reset its ref_count_adj to zero at end if no cycle found.

@clinuxrulz
Copy link
Collaborator

clinuxrulz commented Nov 25, 2022

@alexgcrombie I believe you are right about that visited.set(true) and visited.set(false) pair causing problems in reset_ref_count_adj(). The false needs to be set later after the loop through the old_roots. Also the old_roots (and possibly roots) can probably make use of a HashSet instead of a Vec to avoid duplicate traces/visits.

@clinuxrulz
Copy link
Collaborator

clinuxrulz commented Nov 27, 2022

@alexgcrombie I split reset_ref_count_adj into two so that visited is set to false on a second pass of the roots. You can try it out to see if it helps with that exponential growth. It seemed faster when running the regular tests.

Also constructing your graph within a transaction helps a lot too. collect_cycles only gets executed outside transactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants