Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with fragmented GIOP messages #1201

Open
j18ter opened this issue Aug 18, 2020 · 9 comments
Open

Problems with fragmented GIOP messages #1201

j18ter opened this issue Aug 18, 2020 · 9 comments
Labels

Comments

@j18ter
Copy link
Contributor

j18ter commented Aug 18, 2020

Version

6.5.11/2.5.11

Host machine and operating system

CentOS 7 Linux 3.10.0-1127.13.1.el7.x86_64 on a generic PC with Intel i7 CPU.

Target machine and operating system (if different from host)

Same

Compiler name and version (including patch level)

gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)

The $ACE_ROOT/ace/config.h file

#define ACE_HAS_IPV6
#include <ace/config-linux.h>

The $ACE_ROOT/include/makeinclude/platform_macros.GNU file

include $(ACE_ROOT)/include/makeinclude/platform_linux.GNU

Contents of $ACE_ROOT/bin/MakeProjectCreator/config/default.features

This file does not seem to exist.

AREA/CLASS/EXAMPLE AFFECTED:

A simple variation of the echo example, which returns a string sequence that is slightly bigger
than 4MB. The server is started with command line parameters -ORBMaxMessageSize 1048576
to cause GIOP fragmentation.

The problem effects:

It affects execution. In this example the client receives incorrect values, in our more complex
application the client encounters CORBA::MARSHAL or CORBA::COMM_FAILURE exceptions.

Synopsis

Returning big sequences from operations can cause client exceptions or incorrect values.

Description

Initially seen with a total returned size of approximately 8 MB, and client got CORBA::MARSHAL, The ``-ORBMaxMessageSize"
parameter was not used in this case. Using it to force fragmentation causes simple examples to fail.

Repeat by

A complete example program with MPC and make files can be cloned from Github with this command:

git clone https://github.com/NetAcquire/tao_fragmentation_bug.git

Note: Run the included run_tests.pl script to see the failure. This passes -ORBMaxMessageSize 1048576
to the server. Without this parameter the test succeeds.

Sample fix/ workaround

Tried to debug, but got lost in the GIOP parsing code.

@jwillemsen
Copy link
Member

Thanks for reporting this issue and making a reproducer. Could you attach a client and server logfile with ORB logging 10?

This is very likely a complex issue and it if very unlikely someone has spare time to work on this for free. When you need someone to look at this in detail consider hiring one of the commercial support companies, like Remedy IT for which I work. See https://www.dre.vanderbilt.edu/~schmidt/commercial-support.html for an overview of the companies who provide support

@jwillemsen
Copy link
Member

Could you make a PR for the new test to place it under https://github.com/DOCGroup/ACE_TAO/tree/master/TAO/tests/GIOP_Fragments? Don't add generated files and generated makefiles

@j18ter
Copy link
Contributor Author

j18ter commented Aug 19, 2020

Client and server log files are attached. I compressed these because the server log file includes hex dumps of the complete
4 MB reply. At the very end of server_log.txt one can see that the second element of the string sequence with the value "Hello World" ends up in a separate GIOP message/fragment.

server_log.txt.gz
client_log.txt.gz

Thanks for reporting this issue and making a reproducer. Could you attach a client and server logfile with ORB logging 10?

This is very likely a complex issue and it if very unlikely someone has spare time to work on this for free. When you need someone to look at this in detail consider hiring one of the commercial support companies, like Remedy IT for which I work. See https://www.dre.vanderbilt.edu/~schmidt/commercial-support.html for an overview of the companies who provide support

@j18ter
Copy link
Contributor Author

j18ter commented Aug 19, 2020

Could you make a PR for the new test to place it under https://github.com/DOCGroup/ACE_TAO/tree/master/TAO/tests/GIOP_Fragments? Don't add generated files and generated makefiles

Done (Pull request #1205). This may need some massaging by somebody who is more familiar with the TAO tree and build system than I am.

@j18ter
Copy link
Contributor Author

j18ter commented Aug 21, 2020

TAO does not seem to break strings across fragments, and in fact, it does not even break up an element of a sequence.
This may be the problem.
For example, the reproducing test configures a maximum message size of 1 MB, but the first element of the sequence, a 4 MB string, is transmitted as a single GIOP message. This is not only defeating the purpose of fragmentation, but worse, it causes the following problems.
Starting with GIOP 1.2, all fragments except the last must be a multiple of 8 bytes in size, padded if necessary. In the test, at the end of the first element of the sequence, there are 7 more bytes to the next 8-byte boundary. According to CDR formatting rules, the 4-byte size field of the next string (second element of the sequence) should now follow. Instead, TAO inserts 7 padding bytes, and starts the next sequence element in a new fragment.
The client has no way to know that the 7 bytes, which follow the first sequence element, are padding bytes. It looks for the 4-byte size field of the second element, suitable aligned at a 4-byte boundary, and finds the length to be zero. As far as the client is concerned, it has reached the end of the reply. Depending on the data type of the sequence elements, the consequences can be more serious. Instead of returned the wrong data, the client may encounter inconsistent formatting and throw CORBA::MARSHAL exceptions.
If my interpretation of the GIOP standard is correct, TAO must fragment more aggressively. It must use as much of the available space at the end of each fragment as possible, otherwise the client cannot unambiguously determine how many padding bytes there are. It must fragment at the boundary of primitive data types. This includes braking-up strings. If it fragments only at higher levels, e.g., leaving sequence elements unbroken, the padding becomes ambiguous to clients.

@jwillemsen
Copy link
Member

As far as I remember sending GIOP fragmentation support was added as part of the sendfile support and at that moment only a higher level fragmentation support was implemented.

@j18ter
Copy link
Contributor Author

j18ter commented Aug 22, 2020

Further investigation suggests, that strings specifically care causing issues. Composite data types generated by tao_idl will recursively use the primitive marshaling routines, and as long as these fragment correctly, the composites will too. I have a fix for simple strings (type char without code set translators that might change the length). This fix breaks-up strings, if necessary, to abide by the configured maximum message size, and more importantly, avoids confusing clients with unexpected padding. For more complicated cases, including wide character strings and when code set translators are engaged, I may be able to fall back to a less ambitious solution, which avoids the padding ambiguity by never terminating a fragment before the string, and otherwise using the old approach of not breaking-up the string itself either. Clients should work fine with this, but it can lead to very big fragments. This will have to wait until next week.

@jwillemsen
Copy link
Member

Please extend the unit test you have with various IDL constructs that you are using for testing.

@j18ter
Copy link
Contributor Author

j18ter commented Sep 12, 2020

Please extend the unit test you have with various IDL constructs that you are using for testing.

Not sure I will have time for this. It turns out that this fragmentation issue was not what caused our
problem. Basically, I'm volunteering a fix for what is clearly a bug, but since it is not affecting our
application I have limited time to spend on this.

@jwillemsen jwillemsen added the bug label Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants