Id assignment perf #186

rsjbailey · 2024-01-04T13:51:19Z

ID assignment was pretty slow for large documents

On my machine, adding 200 simple objects to a document took ~70ms, primarily due to the repeated document->lookup() calls. This adds a slightly more complex search for a free id, which reduces the cost of adding 200 objects to about 1.5ms.

A 64 object document (more realistic) went from 2ms -> 200µs, which makes it more reasonable in an SADM context.

This should improve copying of medium/large docs (unless they're primarily common defs, which skipped the lookup anyway)

Probably slower for small documents, but should still be fast as then operating on small vectors.

Could improve further by keeping element vectors sorted on id, but that's a more intrusive change that might affect behaviour as elements would always get written out sorted by ID. (flat_map from boost or C++23 would work, or writing a less generic equivalent isn't too tricky)

codecov-commenter · 2024-01-04T13:56:56Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (3f94a9b) 89.91% compared to head (7c1946c) 90.41%.

Files	Patch %	Lines
src/detail/id_assigner.cpp	97.29%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #186      +/-   ##
==========================================
+ Coverage   89.91%   90.41%   +0.49%     
==========================================
  Files         125      124       -1     
  Lines        5812     5864      +52     
==========================================
+ Hits         5226     5302      +76     
+ Misses        586      562      -24

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tomjnixon · 2024-01-04T16:46:47Z

Looks good to me.

I had a go at using unordered_set instead, which would make this simpler and O(n), but it was a bit slower in practice. The default hash maps in C++ are notoriously slow.

You can make it slightly faster (1.25ms to 1.15ms) with counters.reserve(elements.size());. this might temporarily waste some space, but allocation should normally be constant time for these sizes.

rsjbailey · 2024-01-11T15:14:43Z

The deepCopy() improvements reduce the 200 objects + common definitions copy from ~2ms (post ID changes) to about 500µs, which is probably good enough for most of our use cases.

The downside is there are now two paths for adding things to a document (via add() or via deepCopy()), both of which need to be maintained if we add any more elements/parameters. I think it's reasonable as this should only happen rarely (when the standard changes)

The variant unpacking is a bit messy. I thought of a couple of alternatives:

Write a visitor: Still messy as you'd either need it to be a friend of all the element attourneys and Document or do something nasty by passing internals around by reference.

Write a different copyElements function that returns a struct with element vectors rather than variants, or takes references to the document element vectors: Adds more boilerplate than just having a big conditional (as you still need the original copyElement interface for doing deepCopyTo), and benchmarked performance was identical.

Skips all the checks for ID clashes by simply copying the elements assigning the new document as parent, and pushing into the relevant containers This should be fine as the checks have already been done by the original document, and we're starting from nothing.

Speeds up the deepCopy() and common definitions copy benchmarks by about 30% here

Uses the kitchen sink xml that features all ADM elements to generate a document Then deep copies that document, and checks that xml written from both the document and its copy match (they may differ from the original)

rsjbailey force-pushed the id_assingment_perf branch 2 times, most recently from 08b43dd to 7c1946c Compare January 11, 2024 15:10

rsjbailey marked this pull request as ready for review January 11, 2024 15:23

rsjbailey added 7 commits April 8, 2024 16:50

Benchmark performance of adding many elements to a document

f0f0a86

Improve id assignment performance when adding to large documents

7b68aa0

Reserve counters vector to avoid uneccesary allocation

a84e8fb

Benchmark deepCopy() performance for large document

f1bad9b

Use unordered_map for reference resolution during copy

760dab2

Speeds up the deepCopy() and common definitions copy benchmarks by about 30% here

Add deepCopy() test for a non-trival document

d752080

Uses the kitchen sink xml that features all ADM elements to generate a document Then deep copies that document, and checks that xml written from both the document and its copy match (they may differ from the original)

rsjbailey force-pushed the id_assingment_perf branch from 7c1946c to d752080 Compare April 8, 2024 15:55

rsjbailey merged commit 258059d into master Apr 16, 2024
8 checks passed

rsjbailey deleted the id_assingment_perf branch April 16, 2024 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Id assignment perf #186

Id assignment perf #186

rsjbailey commented Jan 4, 2024 •

edited

Loading

codecov-commenter commented Jan 4, 2024 •

edited

Loading

tomjnixon commented Jan 4, 2024

rsjbailey commented Jan 11, 2024 •

edited

Loading

Id assignment perf #186

Id assignment perf #186

Conversation

rsjbailey commented Jan 4, 2024 • edited Loading

codecov-commenter commented Jan 4, 2024 • edited Loading

Codecov Report

tomjnixon commented Jan 4, 2024

rsjbailey commented Jan 11, 2024 • edited Loading

rsjbailey commented Jan 4, 2024 •

edited

Loading

codecov-commenter commented Jan 4, 2024 •

edited

Loading

rsjbailey commented Jan 11, 2024 •

edited

Loading