Skip to content
This repository was archived by the owner on Mar 31, 2023. It is now read-only.

[Perf] Alcor Control Agent Performance Profiling #441

Open
xieus opened this issue Oct 21, 2020 · 2 comments
Open

[Perf] Alcor Control Agent Performance Profiling #441

xieus opened this issue Oct 21, 2020 · 2 comments
Assignees
Labels
P0 Priority 0 perf testing Performance Testing

Comments

@xieus
Copy link
Contributor

xieus commented Oct 21, 2020

Request

  • Set up a performance profiling framework for ACA
  • Collect latency and throughput metrics for large payload
  • Optimize ACA multiple threading
  • Look into the narrow down locking scope to improve performance at high-concurrency situations
  • Investigate on OVS DB batch insertion to improve performance
@xieus xieus added P1 Priority 1 perf testing Performance Testing labels Oct 21, 2020
@xieus xieus added this to the Version 1.0.2020.11.30 milestone Oct 21, 2020
@xieus
Copy link
Contributor Author

xieus commented Oct 21, 2020

Linked to an umbrella issue #440.

@er1cthe0ne er1cthe0ne added P0 Priority 0 and removed P1 Priority 1 labels Nov 21, 2020
@er1cthe0ne
Copy link
Contributor

er1cthe0ne commented Nov 25, 2020

Per issue description, I will break down the ACA performance profiling task into two major areas.

ACA handling of large payload

  1. Framework to use: aca_tests to create large payload and send to ACA
  2. Example payload could be 1 port create plus 10, 100, ...1000, 10,000, 100,000 neighbors
  3. Collect latency and throughput metrics
  4. Identify bottleneck and problematic areas (possibly OVS)
  5. Optimize ACA multiple threading model, do we want to limit the max parallel thread to use = number of CPU * 2?
  6. Can we bundle a batch (e.g. 10) of similar neighbors to process in a single call? It may help with the locking mechanism of ACA internal structures.

ACA handling of packet in message from OVS

  1. Framework to use: cbench (https://github.com/mininet/oflops/tree/master/cbench) to ACA as an openflow controller
  2. Use the payload generated from cbench, test the latency mode then throughput mode
  3. Collect latency and throughput metrics
  4. Identify bottleneck and problematic areas
  5. When we have on demand L3 routing rules implemented, it is possible for VM to quickly create a lot of new connections to a new neighbor which will generate a lot of packet in message to ACA for process. We need to confirm ACA can handle this
  6. If ACA slow down is observed, consider spining up more threads to handle mulitple packet in message in parallel

Other Notes

  1. Can we use framework like SeaStar to improve ACA threading model? https://github.com/futurewei-cloud/chogori-seastar-rd

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P0 Priority 0 perf testing Performance Testing
Projects
None yet
Development

No branches or pull requests

2 participants