HPCC-33665: Define a function to open and parse binary event data files#19658
Conversation
|
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33665 Jirabot Action Result: |
- Declare the IEventVisitor interface to receive parts of a binary event data file as they are parsed. - Define JLib method readEvents to open and parse one event data file. Each datum is passed to an IEventVisitor instance in the order it appears in the file. - Define a primitive IEventVisitor implementation for illustration and testing. Signed-off-by: Tim Klemm <Tim.Klemm@lexisnexisrisk.com>
c0cb0c6 to
632630a
Compare
ghalliday
left a comment
There was a problem hiding this comment.
@timothyklemm a set of comments. I think it would be worth addressing/discussing most of them, but I am inclined to merge as-is, and then look for follow up PRs that address the comments.
| size32_t got = 0; | ||
| if (len) | ||
| { | ||
| const char* s = (const char*)stream.peek(len, got); |
There was a problem hiding this comment.
minor. You should be able to use stream.read(len) instead to simplify the code.
| } | ||
| else | ||
| { | ||
| for (;;) { |
There was a problem hiding this comment.
Later it would make sense for this to be extracted as a separate function.
| } | ||
|
|
||
| //Abstract interface for binary event file traversal. | ||
| interface IEventFile : extends IInterface |
There was a problem hiding this comment.
Something to discuss - I don't think this is needed/gives you any benefit.
There was a problem hiding this comment.
ok. I see you have used it because you are thinking you may want different class implementations for different versions.
I would expect a single EventReader class where the version is pased in (as you have it already), but very unlikely to have multiple implementations - so I would simplify the calling code and assume a single reader class.
| for (;;) | ||
| { | ||
| // no more data means no more events | ||
| size32_t got = 0; |
There was a problem hiding this comment.
There should be a null event terminator, so read could be used?
There was a problem hiding this comment.
It would be simpler to exit the loop on EventNone if it was included. For the record, I probably would have passed it to the visitor, like I am passing EvAttrNone.
| readToken(stream, attr, bytesRead); | ||
| if (EvAttrNone == attr) | ||
| { | ||
| if (!finishAttribute(attr, visitor, mute)) |
There was a problem hiding this comment.
minor: I don't think a none attr should be passed through, that is an implementation detail of the file format.
There was a problem hiding this comment.
I was looking at this as an implied End Of Event notification, without adding a leaveEvent (which is what the name would have been, to match leaveFile). Would you want an explicit "this event is complete" method, or would you expect visitors to terminate events in progress before starting the next or leaving the file?
There was a problem hiding this comment.
I think I would expect an explicit leave function if that is passed through to the visitor.
|
|
||
| //Read a strongly typed value from a buffered stream. | ||
| template<typename T> | ||
| static T readToken(IBufferedSerialInputStream& stream, T& token, size32_t& bytesRead) |
There was a problem hiding this comment.
If these were member functions they would only need a single parameter - so calling code would be simpler/cleaner.
There was a problem hiding this comment.
Because I prepared for multiple classes to support multiple versions, I needed to read the version token before creating the class to read everything else. If we limit it to one class for all versions, as you suggested elsewhere, I agree member methods would be cleaner.
There was a problem hiding this comment.
I would recommend a single class - I can't think of any examples in the platform where we have used multiple classes for different versions like this.
| if (!fileIO) | ||
| throw makeStringExceptionV(-1, "file '%s' not opened for reading", file.queryFilename()); | ||
| Owned<ISerialInputStream> baseStream = createSerialInputStream(fileIO); | ||
| if (!baseStream) |
There was a problem hiding this comment.
minor: this can not return null.
| if (!baseStream) | ||
| throw makeStringExceptionV(-1, "file '%s' input stresm not created", file.queryFilename()); | ||
| Owned<IBufferedSerialInputStream> bufferedStream = createBufferedInputStream(baseStream, 0x100000, false); | ||
| if (!bufferedStream) |
There was a problem hiding this comment.
similarly, this can not return null
| EvAttrPath, | ||
| EvAttrConnectId, | ||
| EvAttrEnabled, | ||
| EvAttrSysFileSize, |
There was a problem hiding this comment.
naming: Avoid the Sys prefix - some attributes e.g. FileSize may well be provided for events/meta as well.
There was a problem hiding this comment.
I wanted to differentiate the file being visited from all of the files referenced within that file. The prefix is also meant to distinguish auto-generated values from event data. Would you remove the prefix from all of the attributes, or just those that might be reused by an event?
There was a problem hiding this comment.
Personally I would remove sys from all attributes. I might also implement the timestamp differently, but that is a separate discussion.
| virtual Continuation visitAttribute(EventAttr id, uint16_t value) = 0; | ||
| virtual Continuation visitAttribute(EventAttr id, uint32_t value) = 0; | ||
| virtual Continuation visitAttribute(EventAttr id, uint64_t value) = 0; | ||
| virtual void leaveFile(uint32_t bytesRead) = 0; |
There was a problem hiding this comment.
not sure I like the name leave - ideally the verbs should be paired e.g. begin/end enter/leave.
I could also be hyper-picky about visit as a prefix, but I don't care enough!
timothyklemm
left a comment
There was a problem hiding this comment.
I'm OK with it merging now, with updates to follow. I'm close to having a first pass for evtool dump that depends on this, so merging will make that easier to deal with.
|
|
||
| //Read a strongly typed value from a buffered stream. | ||
| template<typename T> | ||
| static T readToken(IBufferedSerialInputStream& stream, T& token, size32_t& bytesRead) |
There was a problem hiding this comment.
Because I prepared for multiple classes to support multiple versions, I needed to read the version token before creating the class to read everything else. If we limit it to one class for all versions, as you suggested elsewhere, I agree member methods would be cleaner.
| for (;;) | ||
| { | ||
| // no more data means no more events | ||
| size32_t got = 0; |
There was a problem hiding this comment.
It would be simpler to exit the loop on EventNone if it was included. For the record, I probably would have passed it to the visitor, like I am passing EvAttrNone.
| readToken(stream, attr, bytesRead); | ||
| if (EvAttrNone == attr) | ||
| { | ||
| if (!finishAttribute(attr, visitor, mute)) |
There was a problem hiding this comment.
I was looking at this as an implied End Of Event notification, without adding a leaveEvent (which is what the name would have been, to match leaveFile). Would you want an explicit "this event is complete" method, or would you expect visitors to terminate events in progress before starting the next or leaving the file?
| EvAttrPath, | ||
| EvAttrConnectId, | ||
| EvAttrEnabled, | ||
| EvAttrSysFileSize, |
There was a problem hiding this comment.
I wanted to differentiate the file being visited from all of the files referenced within that file. The prefix is also meant to distinguish auto-generated values from event data. Would you remove the prefix from all of the attributes, or just those that might be reused by an event?
Type of change:
Checklist:
Smoketest:
Testing: