Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consolidated PR with changes from 13-17 #18

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

bill-torpey
Copy link
Contributor

Hi Frank:

As per your request, here's a consolidated PR with the changes that we're currently building with in development.

Let me know if there's anything else I can do to make this less painful.

Thanks!

Copy link
Collaborator

@fquinner fquinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments after a quick skim to get this rolling but will need to have a more detailed look. In general it's fine I'm just not sure about the performance of some of the indexing / preparser stuff.

CMakeLists.txt Outdated Show resolved Hide resolved
src/CMakeLists.txt Show resolved Hide resolved
src/Field.cpp Outdated Show resolved Hide resolved
src/Field.cpp Outdated Show resolved Hide resolved
@@ -356,6 +367,9 @@ class OmnmPayloadImpl {

// Meta data for the payload
omnmHeader mHeader;

// Offset information for all fields
std::vector<fieldHint> mFieldHints;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical about potential performance issues here and what it's used for. Would need to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do -- in our testing the difference was orders of magnitude.

@@ -907,6 +928,39 @@ omnmmsgPayload_unSerialize (const msgPayload msg,
// Move tail to end of buffer
impl->mPayloadBufferTail = bufferLength;

impl->mFieldHints.clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole preparsing sectionlooks expensive and should at least be optional. It's iterating the entire buffer before the app has even seen it and populates an STL object with an index. This will penalize well optimized OpenMAMA applications and reward poorly optimized OpenMAMA applications.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like appearances could be deceiving - this branch certainly does look faster on first batch of tests - nice.

@@ -942,7 +996,10 @@ omnmmsgPayload_createFromByteBuffer (msgPayload* msg,
{
if (NULL == msg || NULL == buffer) return MAMA_STATUS_NULL_ARG;
if (0 == bufferLength) return MAMA_STATUS_INVALID_ARG;
omnmmsgPayload_create (msg);

OmnmPayloadImpl* impl = new OmnmPayloadImpl(bufferLength);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume as opposed to just delegating to omnmmsgPayload_create (which does the same thing)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - not sure the value of the change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... in both cases we know the size of the buffer so are able to allocate proper size right off.

@fquinner
Copy link
Collaborator

fquinner commented Jun 8, 2020

I finally put together a little benchmark app to kick the tyres on this.

It's basically as I expected - serialization / deserialization is slower, but direct access updates (e.g. update by field parameters, get by field parameters rather than iteration) is a little faster.

Overall though it looks to be slower.

If I run with your newer code, compiled with

cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DMAMA_ROOT=/opt/openmama && make -j

I get this:

Benchmark for iterations with updates and writes: 22.664608s
Benchmark for iterations with (mostly) just reads: 13.410863s
Benchmark for iterations with direct field lookup (no iteration): 16.292332s
Benchmark for Serialization / Deserialization tests: 33.144798s
Total for all tests: 85.512749s

As opposed to this with the old omnm version (i.e. with none of the optimizations including the strlen related changes which I expect would make it faster):

Benchmark for iterations with updates and writes: 29.021072s
Benchmark for iterations with (mostly) just reads: 14.280121s
Benchmark for iterations with direct field lookup (no iteration): 32.298950s
Benchmark for Serialization / Deserialization tests: 1.433974s
Total for all tests: 77.034348s

So the old version actually completes the benchmark faster overall. The serialization / deserialization tests are also (if anything) understated since during the tests, messages were tested 300M times, but serialization only 100M.

I appreciate the test is contrived (though honestly wasn't designed to favour one over the other) but it highlights the suspicion that these changes produce performance gains during iteration when using direct access but performance degradation when iterating (when serialization and deserialization are included in the measurement).

I have contributed my benchmarking code for you to have a look, but as it stands, the above would re-emphasize to me that this pre-indexing should be a configurable option since it appears slower overall unless you're frequently using the direct field accessors rather than iterating, or are a publishing application which is frequently updating.

There are various situations where your code will be faster and there are various situations where the previous code would be faster.

For example given a message with 100 fields inside, if you discard 99% of messages based on a field which is early in the message, the old way will always be much faster.

If you have a message with 100 fields inside, and you need to iterate multiple times, your way will be much faster.

So I would recommend an option to toggle this behaviour such as:

mama.payload.omnmmsg.indexed = true

Which could be configured in omnmmsgPayload_init

@bill-torpey
Copy link
Contributor Author

We'll take a look at this as soon as we're able. One thought would be to build the index lazily -- that would obviously be ideal, but not sure how much effort would be involved.

@marek-teterycz
Copy link

Frank,

In our case we use named fields and not just fids. I also expanded on the test a little and added 7 additional fields, the results are much different. It seems that adding more fields doesn't really change the run time with our changes to omnm but the original version slows down significantly.

Our version of omnm:

Benchmark for iterations with updates and writes: 19.960340s
Benchmark for iterations with (mostly) just reads: 12.207262s
Benchmark for iterations with direct field lookup (no iteration): 14.350084s
Benchmark for Serialization / Deserialization tests: 25.520180s
Total for all tests: 72.038017s

Original version of omnm:

Benchmark for iterations with updates and writes: 78.090248s
Benchmark for iterations with (mostly) just reads: 38.457447s
Benchmark for iterations with direct field lookup (no iteration): 151.102585s
Benchmark for Serialization / Deserialization tests: 1.552331s
Total for all tests: 269.203278s

Here is the expanded test:

Benchmarker.zip

@fquinner
Copy link
Collaborator

That just further reinforces to me though that it should be configurable if the performance varies according to usage. It introduces latency between the middleware bridge and the onmsg that the user cannot control.

@bill-torpey
Copy link
Contributor Author

Hi Frank:

Understand your issues with the field hints (sort of), but making changes you suggest is a bit of work on our end.

Would it make sense to re-open the individual PR's to get them out of the way in the meantime?

@fquinner
Copy link
Collaborator

To be honest I suspect making it configurable could be less work overall, but whatever you feel most comfortable with works for me!

@bill-torpey
Copy link
Contributor Author

Personally, I'd rather build the hints lazily -- that way you get the benefit "automagically" if you're querying the message multiple times, but if you don't you don't pay the price.

Of course, that means keeping a "trail of bread crumbs" as the code scans through the message, which might never be used again, but that seems an insignificant cost.

OTOH, you may feel differently, so pls let me know. If you're not OK with the lazy approach, then an option is still needed, and in that case the lazy approach is prob. overkill.

@fquinner
Copy link
Collaborator

Lazy loading makes sense to me of that's something you think would be easier to implement. Its important for people measuring transport latencies since otherwise the payload would get indexed before the onmsg which could be misleading for those doing latency measurements and might otherwise attribute indexing latency to the transport / middleware rather than the payload.

@bill-torpey
Copy link
Contributor Author

Lazy loading makes sense to me of that's something you think would be easier to implement.

It wouldn't be easier, but it would be better, and would eliminate the need for a flag. The idea being that as much as possible things should "just work".

I'll see about re-opening the other PRs so we can get those out of the way, and then we can tackle this one.

…ot optimized away)

- see e.g., https://queue.acm.org/detail.cfm?id=3468263

ID=5095f6bb99f24c0caac9e09557e0234f Error=unsigned integer overflow: 388 + 18446744073709551614 cannot be represented in type 'unsigned long'
        OmnmPayloadImpl::updateField(mamaFieldType_, omnmFieldImpl&, unsigned char*, unsigned long) /home/btorpey/work/OpenMAMA-omnm/master/src/src/Payload.cpp:604:28
        OmnmPayloadImpl::updateField(mamaFieldType_, char const*, unsigned short, unsigned char*, unsigned long) /home/btorpey/work/OpenMAMA-omnm/master/src/src/Payload.cpp:544:12
        omnmmsgPayload_updateString /home/btorpey/work/OpenMAMA-omnm/master/src/src/Payload.cpp:1764:38
        mamaMsg_updateString /home/btorpey/work/OpenMAMA/master/src/mama/c_cpp/src/c/msg.c:2295:12
        Transact::MamaMessageImpl::setValue(char const*, char const*) /home/btorpey/work/transact/master/src/common/Middleware/MamaAdapter/MamaMessageImpl.cpp:1273:25
        TransactX::RecField::updateMsg(Transact::IMessage*) /home/btorpey/work/transact/master/src/common/xla/mx_pubtypes.cpp:854:49
        TransactX::SnapshotServiceHandler::DoSnapshot(TransactX::SnapshotReq*) /home/btorpey/work/transact/master/src/common/snapshot/SnapshotServiceHandler.cpp:565:28
        TransactX::SnapshotServiceHandler::run() /home/btorpey/work/transact/master/src/common/snapshot/SnapshotServiceHandler.cpp:215:21
        MXThread_launcher /home/btorpey/work/transact/master/src/api/src/MXThread.cpp:109:9
        start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
        clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants