consolidated PR with changes from 13-17 #18

bill-torpey · 2020-05-04T20:11:11Z

Hi Frank:

As per your request, here's a consolidated PR with the changes that we're currently building with in development.

Let me know if there's anything else I can do to make this less painful.

Thanks!

… populated - expanded unit tests, as per Marek

…ld size

- always do a release build w/sanitizer - ensure '-fno-omit-frame-pointer' is set

fquinner

Added some comments after a quick skim to get this rolling but will need to have a more detailed look. In general it's fine I'm just not sure about the performance of some of the indexing / preparser stuff.

CMakeLists.txt

src/CMakeLists.txt

src/Field.cpp

fquinner · 2020-05-12T19:48:27Z

src/Payload.h

@@ -356,6 +367,9 @@ class OmnmPayloadImpl {

    // Meta data for the payload
    omnmHeader    mHeader;
+
+    // Offset information for all fields
+    std::vector<fieldHint>   mFieldHints;


I'm skeptical about potential performance issues here and what it's used for. Would need to test.

Please do -- in our testing the difference was orders of magnitude.

fquinner · 2020-05-12T19:52:31Z

src/Payload.cpp

@@ -907,6 +928,39 @@ omnmmsgPayload_unSerialize (const msgPayload    msg,
    // Move tail to end of buffer
    impl->mPayloadBufferTail = bufferLength;

+    impl->mFieldHints.clear();


This whole preparsing sectionlooks expensive and should at least be optional. It's iterating the entire buffer before the app has even seen it and populates an STL object with an index. This will penalize well optimized OpenMAMA applications and reward poorly optimized OpenMAMA applications.

Looks like appearances could be deceiving - this branch certainly does look faster on first batch of tests - nice.

fquinner · 2020-05-12T19:57:05Z

src/Payload.cpp

@@ -942,7 +996,10 @@ omnmmsgPayload_createFromByteBuffer (msgPayload*         msg,
 {
    if (NULL == msg || NULL == buffer) return MAMA_STATUS_NULL_ARG;
    if (0 == bufferLength) return MAMA_STATUS_INVALID_ARG;
-    omnmmsgPayload_create (msg);
+
+    OmnmPayloadImpl* impl = new OmnmPayloadImpl(bufferLength);


I assume as opposed to just delegating to omnmmsgPayload_create (which does the same thing)?

yeah - not sure the value of the change

Ahh... in both cases we know the size of the buffer so are able to allocate proper size right off.

fquinner · 2020-06-08T21:36:03Z

I finally put together a little benchmark app to kick the tyres on this.

It's basically as I expected - serialization / deserialization is slower, but direct access updates (e.g. update by field parameters, get by field parameters rather than iteration) is a little faster.

Overall though it looks to be slower.

If I run with your newer code, compiled with

cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DMAMA_ROOT=/opt/openmama && make -j

I get this:

Benchmark for iterations with updates and writes: 22.664608s
Benchmark for iterations with (mostly) just reads: 13.410863s
Benchmark for iterations with direct field lookup (no iteration): 16.292332s
Benchmark for Serialization / Deserialization tests: 33.144798s
Total for all tests: 85.512749s

As opposed to this with the old omnm version (i.e. with none of the optimizations including the strlen related changes which I expect would make it faster):

Benchmark for iterations with updates and writes: 29.021072s
Benchmark for iterations with (mostly) just reads: 14.280121s
Benchmark for iterations with direct field lookup (no iteration): 32.298950s
Benchmark for Serialization / Deserialization tests: 1.433974s
Total for all tests: 77.034348s

So the old version actually completes the benchmark faster overall. The serialization / deserialization tests are also (if anything) understated since during the tests, messages were tested 300M times, but serialization only 100M.

I appreciate the test is contrived (though honestly wasn't designed to favour one over the other) but it highlights the suspicion that these changes produce performance gains during iteration when using direct access but performance degradation when iterating (when serialization and deserialization are included in the measurement).

I have contributed my benchmarking code for you to have a look, but as it stands, the above would re-emphasize to me that this pre-indexing should be a configurable option since it appears slower overall unless you're frequently using the direct field accessors rather than iterating, or are a publishing application which is frequently updating.

There are various situations where your code will be faster and there are various situations where the previous code would be faster.

For example given a message with 100 fields inside, if you discard 99% of messages based on a field which is early in the message, the old way will always be much faster.

If you have a message with 100 fields inside, and you need to iterate multiple times, your way will be much faster.

So I would recommend an option to toggle this behaviour such as:

mama.payload.omnmmsg.indexed = true

Which could be configured in omnmmsgPayload_init

bill-torpey · 2020-06-09T14:13:41Z

We'll take a look at this as soon as we're able. One thought would be to build the index lazily -- that would obviously be ideal, but not sure how much effort would be involved.

marek-teterycz · 2020-06-11T16:10:22Z

Frank,

In our case we use named fields and not just fids. I also expanded on the test a little and added 7 additional fields, the results are much different. It seems that adding more fields doesn't really change the run time with our changes to omnm but the original version slows down significantly.

Our version of omnm:

Benchmark for iterations with updates and writes: 19.960340s
Benchmark for iterations with (mostly) just reads: 12.207262s
Benchmark for iterations with direct field lookup (no iteration): 14.350084s
Benchmark for Serialization / Deserialization tests: 25.520180s
Total for all tests: 72.038017s

Original version of omnm:

Benchmark for iterations with updates and writes: 78.090248s
Benchmark for iterations with (mostly) just reads: 38.457447s
Benchmark for iterations with direct field lookup (no iteration): 151.102585s
Benchmark for Serialization / Deserialization tests: 1.552331s
Total for all tests: 269.203278s

Here is the expanded test:

Benchmarker.zip

fquinner · 2020-06-11T16:54:31Z

That just further reinforces to me though that it should be configurable if the performance varies according to usage. It introduces latency between the middleware bridge and the onmsg that the user cannot control.

bill-torpey · 2020-08-13T14:11:40Z

Hi Frank:

Understand your issues with the field hints (sort of), but making changes you suggest is a bit of work on our end.

Would it make sense to re-open the individual PR's to get them out of the way in the meantime?

fquinner · 2020-08-17T20:06:53Z

To be honest I suspect making it configurable could be less work overall, but whatever you feel most comfortable with works for me!

bill-torpey · 2020-08-17T20:13:31Z

Personally, I'd rather build the hints lazily -- that way you get the benefit "automagically" if you're querying the message multiple times, but if you don't you don't pay the price.

Of course, that means keeping a "trail of bread crumbs" as the code scans through the message, which might never be used again, but that seems an insignificant cost.

OTOH, you may feel differently, so pls let me know. If you're not OK with the lazy approach, then an option is still needed, and in that case the lazy approach is prob. overkill.

fquinner · 2020-08-17T22:28:06Z

Lazy loading makes sense to me of that's something you think would be easier to implement. Its important for people measuring transport latencies since otherwise the payload would get indexed before the onmsg which could be misleading for those doing latency measurements and might otherwise attribute indexing latency to the transport / middleware rather than the payload.

bill-torpey · 2020-08-18T13:38:10Z

Lazy loading makes sense to me of that's something you think would be easier to implement.

It wouldn't be easier, but it would be better, and would eliminate the need for a flag. The idea being that as much as possible things should "just work".

I'll see about re-opening the other PRs so we can get those out of the way, and then we can tackle this one.

…ot optimized away) - see e.g., https://queue.acm.org/detail.cfm?id=3468263

…ot optimized away) - see e.g., https://queue.acm.org/detail.cfm?id=3468263 ID=5095f6bb99f24c0caac9e09557e0234f Error=unsigned integer overflow: 388 + 18446744073709551614 cannot be represented in type 'unsigned long' OmnmPayloadImpl::updateField(mamaFieldType_, omnmFieldImpl&, unsigned char*, unsigned long) /home/btorpey/work/OpenMAMA-omnm/master/src/src/Payload.cpp:604:28 OmnmPayloadImpl::updateField(mamaFieldType_, char const*, unsigned short, unsigned char*, unsigned long) /home/btorpey/work/OpenMAMA-omnm/master/src/src/Payload.cpp:544:12 omnmmsgPayload_updateString /home/btorpey/work/OpenMAMA-omnm/master/src/src/Payload.cpp:1764:38 mamaMsg_updateString /home/btorpey/work/OpenMAMA/master/src/mama/c_cpp/src/c/msg.c:2295:12 Transact::MamaMessageImpl::setValue(char const*, char const*) /home/btorpey/work/transact/master/src/common/Middleware/MamaAdapter/MamaMessageImpl.cpp:1273:25 TransactX::RecField::updateMsg(Transact::IMessage*) /home/btorpey/work/transact/master/src/common/xla/mx_pubtypes.cpp:854:49 TransactX::SnapshotServiceHandler::DoSnapshot(TransactX::SnapshotReq*) /home/btorpey/work/transact/master/src/common/snapshot/SnapshotServiceHandler.cpp:565:28 TransactX::SnapshotServiceHandler::run() /home/btorpey/work/transact/master/src/common/snapshot/SnapshotServiceHandler.cpp:215:21 MXThread_launcher /home/btorpey/work/transact/master/src/api/src/MXThread.cpp:109:9 start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8 clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

… staging

…s casts

wtorpey and others added 21 commits March 20, 2020 09:45

dont print fid if 0

1fd04cb

dont print fid if 0

d719737

- fix bug in omnmmsgFieldPayload_getOpaque where return value was not…

14eb503

… populated - expanded unit tests, as per Marek

- more robust serialize/deserialize test

76eca61

- more robust serialize/deserialize test

28a0066

fix bug in OmnmPayloadImpl::updateField when calculating delta in fie…

b2352e0

…ld size

performance improvements

deda74c

- fix comments

5be12fa

print size for opaques, rather than contents

66530c4

minor formatting changes

2e5928b

allow setting fields to zero length

6a21ded

fix static analysis warnings reported by clang-tidy, pvs

37e6b4a

clean up cmake project file

4ee098a

clean up cmake project file

165dc55

definition required to expose UINT32_MAX on some compilers

31eedd1

clean up gtest build settings

c10a690

addl build warnings, sanitizers

2c98712

only build unit tests if GTEST_ROOT is defined

876da2f

- reinstate win32 stuff from upstream

90ddec5

improve alignment/remove gaps

fbe4bc9

clean up sanitizer support

01ed45f

- always do a release build w/sanitizer - ensure '-fno-omit-frame-pointer' is set

This was referenced May 12, 2020

build script changes #17

Closed

misc minor fixes #16

Closed

performance improvements #15

Closed

improve handling of opaque (blob) fields #14

Closed

dont print fid if zero #13

Closed

fquinner requested changes May 12, 2020

View reviewed changes

bill-torpey added 2 commits May 13, 2020 08:55

as per cascadium#18

40158f3

remove commented-out code

a766f7b

first cut at Mac support

403f537

bill-torpey added 11 commits August 19, 2020 13:24

add bench for payload lib

9b2471e

ubsan builds should be done w/o optimizations (to ensure that ub is n…

592d744

…ot optimized away) - see e.g., https://queue.acm.org/detail.cfm?id=3468263

Merge branch 'staging' of https://github.com/nyfix/OpenMAMA-omnm into…

afebacb

… staging

fix incorrect declaration (void**), remove incorrect & now superfluou…

aa8fde4

…s casts

- support benchmark, unit tests

d181c83

Merge branch 'ubsan' into staging

bb9ce39

[BUS-1914] - fix compiler warning

c718894

[BUS-1914] - fix warnings re: alignment

3b321d9

[BUS-2389] - fix bug where payload buffer and hints get out of sync

7f91036

allow for zero-length blob's (but not null)

a092be3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consolidated PR with changes from 13-17 #18

consolidated PR with changes from 13-17 #18

bill-torpey commented May 4, 2020

fquinner left a comment

fquinner May 12, 2020

bill-torpey May 12, 2020

fquinner May 12, 2020

fquinner May 12, 2020

fquinner May 12, 2020

bill-torpey May 12, 2020

fquinner May 12, 2020

bill-torpey May 13, 2020

fquinner commented Jun 8, 2020

bill-torpey commented Jun 9, 2020

marek-teterycz commented Jun 11, 2020

fquinner commented Jun 11, 2020

bill-torpey commented Aug 13, 2020

fquinner commented Aug 17, 2020

bill-torpey commented Aug 17, 2020

fquinner commented Aug 17, 2020

bill-torpey commented Aug 18, 2020

consolidated PR with changes from 13-17 #18

Are you sure you want to change the base?

consolidated PR with changes from 13-17 #18

Conversation

bill-torpey commented May 4, 2020

fquinner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fquinner commented Jun 8, 2020

bill-torpey commented Jun 9, 2020

marek-teterycz commented Jun 11, 2020

fquinner commented Jun 11, 2020

bill-torpey commented Aug 13, 2020

fquinner commented Aug 17, 2020

bill-torpey commented Aug 17, 2020

fquinner commented Aug 17, 2020

bill-torpey commented Aug 18, 2020