Skip to content
This repository has been archived by the owner on Dec 20, 2023. It is now read-only.

Do not try to allocate memory on de-serialization when there is no need #493

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Shumaro
Copy link
Contributor

@Shumaro Shumaro commented Feb 13, 2020

This will allow products without dynamic allocation to use de-serialization provided that there are not strings to de-serialize.

@Shumaro Shumaro force-pushed the feature/fix_deserialization branch from d946ccc to a9cb272 Compare February 13, 2020 14:49

else
{
*static_cast<char**>(aStructureData) = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems likely to cause problems with existing code that frees the deserialized string. Specifically, any code attempting to free the string would need to be modified to detect zero length strings and not to call mem_free() on them. Otherwise, a crash or memory heap corruption could result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right.
I updated the patch so now there is no attempt to call mem_free() on empty strings.

@Shumaro Shumaro force-pushed the feature/fix_deserialization branch from a9cb272 to 318c12c Compare February 14, 2020 16:18
@jaylogue jaylogue requested a review from robszewczyk February 15, 2020 16:20
@Shumaro Shumaro force-pushed the feature/fix_deserialization branch from 318c12c to 0cba4f7 Compare February 17, 2020 12:23
@codecov-io
Copy link

codecov-io commented Feb 17, 2020

Codecov Report

Merging #493 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #493      +/-   ##
==========================================
- Coverage   54.05%   54.05%   -0.01%     
==========================================
  Files         344      344              
  Lines       59061    59065       +4     
==========================================
+ Hits        31924    31926       +2     
- Misses      27137    27139       +2
Impacted Files Coverage Δ
src/lib/support/SerializationUtils.cpp 83.49% <100%> (+0.07%) ⬆️
src/lib/support/StatusReportStr.cpp 81.28% <0%> (-0.97%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9708816...586425e. Read the comment docs.

@Shumaro Shumaro force-pushed the feature/fix_deserialization branch 3 times, most recently from c0445dd to 586425e Compare February 17, 2020 16:17
LogReadWrite("%s bytestring allocated %d bytes at %p", "R", byteString.mLen, byteString.mBuf);
*static_cast<SerializedByteString *>(aStructureData) = byteString;
*static_cast<SerializedByteString *>(aStructureData) = byteString;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help me understand this. If the encoding contains a TLV byte string that is zero bytes in length, what ends up being stored in aStructureData?

I believe the answer cannot be NULL, as the received must have a way to distinguish the absence of a byte string value from a byte string value that is zero length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken there is no way for a reader of distinguishing between those two scenarios.

When serializing TLV byte string the data is encoded as follow:

Byte0:tag id for the TLV byte string property
Byte1:number of bytes.
Byte2:DataByte1
Byte3:DataByte2
.
.
.
Byten:tag id end.

In the absence of a byte string value as well as for a byte string value with 0 length the enconded TLV looks:

Byte0:tag id for the TLV byte string property.
Byte1:0x00.
Byte2:tag id end.

Anyways, I can't imagine any scenario in which a reader needs to distinguish the absence of a byte string value from a byte string value that is zero length, can you?

Currently, the reader would read
byteString.mLen = 0
byteString.mBuf = NULL or and address which can be free later(it depends on malloc implementation);

With this patch:
byteString.mLen = not changed
byteString.mBuf = not changed

Probably, it would be less confusing if this was passed instead:
byteString.mLen = 0
byteString.mBuf = NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, since for an empty UTF8 string is fine not to change the value passed into the struct, it should be fine to leave it unchanged for TLV byte string too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your second example of the absence of a byte string is incorrect. If a byte string, or any other TLV element is absent, then its encoding is entirely omitted--i.e. no control byte and no tag or value bytes. When parsing, the indication that an element was omitted by the sender is the receiver encountering the end of the container without encountering the element.

The Weave Data Language provides support for optional fields (distinct from nullable) that employ this construct. For example, it is possible to extend a WDL event with a new optional field after the original event was defined. In this case, a receiver of an event may need to distinguish between an event that was encoded by an old sender, which didn’t know about the new field, and a new sender that knows about the field but sent a zero length value. The natural way of representing this in the decoded form of the event is a NULL pointer to a byte array for absent case and a pointer to a zero-length byte array for the present but length 0 case.

Copy link
Contributor Author

@Shumaro Shumaro Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, I was considering setting nullified the TLV string field as TLV element absent which is not (in this case encoded data seems to be the same as setting TLV string len to 0).

BTW, in the current code, it seems to me that a 0 length buffer might be passed to malloc and it is expected to return something different than NULL which might be an issue since malloc 0 size returns either NULL, or a unique pointer value that can later be successfully passed to free().

            byteString.mBuf = static_cast<uint8_t *>(memMgmt->mem_alloc(byteString.mLen));
            VerifyOrExit(byteString.mBuf != NULL, err = WEAVE_ERROR_NO_MEMORY);

``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the code for a byte string unmodified with the exception that if length is 0, it does not exit with error when using mem_alloc. I believe this is a better approach since calling malloc for 0 size is implementation defined and might return NULL.

That also will allow products without dynamic allocation to use de-serialization provided that there are not strings to de-serialize.

Do you have any other concern?

@Shumaro Shumaro force-pushed the feature/fix_deserialization branch from 586425e to cbc27c4 Compare February 21, 2020 13:26
And not try to allocate empty UTF8 strings.
@Shumaro Shumaro force-pushed the feature/fix_deserialization branch from cbc27c4 to c775c6e Compare March 6, 2020 14:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants