Skip to content

HPCC-33665: Define a function to open and parse binary event data files#19658

Merged
ghalliday merged 1 commit intohpcc-systems:masterfrom
timothyklemm:hpcc-33665-event-reader
Mar 31, 2025
Merged

HPCC-33665: Define a function to open and parse binary event data files#19658
ghalliday merged 1 commit intohpcc-systems:masterfrom
timothyklemm:hpcc-33665-event-reader

Conversation

@timothyklemm
Copy link
Copy Markdown

@timothyklemm timothyklemm commented Mar 25, 2025

  • Declare the IEventVisitor interface to receive parts of a binary event data file as they are parsed.
  • Define JLib method readEvents to open and parse one event data file. Each datum is passed to an IEventVisitor instance in the order it appears in the file.
  • Define a primitive IEventVisitor implementation for illustration and testing.

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@github-actions
Copy link
Copy Markdown

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33665

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

- Declare the IEventVisitor interface to receive parts of a binary event data
  file as they are parsed.
- Define JLib method readEvents to open and parse one event data file. Each
  datum is passed to an IEventVisitor instance in the order it appears in the
  file.
- Define a primitive IEventVisitor implementation for illustration and testing.

Signed-off-by: Tim Klemm <Tim.Klemm@lexisnexisrisk.com>
@timothyklemm timothyklemm force-pushed the hpcc-33665-event-reader branch from c0cb0c6 to 632630a Compare March 25, 2025 15:12
@timothyklemm timothyklemm requested a review from ghalliday March 25, 2025 15:13
Copy link
Copy Markdown
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyklemm a set of comments. I think it would be worth addressing/discussing most of them, but I am inclined to merge as-is, and then look for follow up PRs that address the comments.

Comment thread system/jlib/jevent.cpp
size32_t got = 0;
if (len)
{
const char* s = (const char*)stream.peek(len, got);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor. You should be able to use stream.read(len) instead to simplify the code.

Comment thread system/jlib/jevent.cpp
}
else
{
for (;;) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later it would make sense for this to be extracted as a separate function.

Comment thread system/jlib/jevent.cpp
}

//Abstract interface for binary event file traversal.
interface IEventFile : extends IInterface
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to discuss - I don't think this is needed/gives you any benefit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I see you have used it because you are thinking you may want different class implementations for different versions.
I would expect a single EventReader class where the version is pased in (as you have it already), but very unlikely to have multiple implementations - so I would simplify the calling code and assume a single reader class.

Comment thread system/jlib/jevent.cpp
for (;;)
{
// no more data means no more events
size32_t got = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a null event terminator, so read could be used?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simpler to exit the loop on EventNone if it was included. For the record, I probably would have passed it to the visitor, like I am passing EvAttrNone.

Comment thread system/jlib/jevent.cpp
readToken(stream, attr, bytesRead);
if (EvAttrNone == attr)
{
if (!finishAttribute(attr, visitor, mute))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I don't think a none attr should be passed through, that is an implementation detail of the file format.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at this as an implied End Of Event notification, without adding a leaveEvent (which is what the name would have been, to match leaveFile). Would you want an explicit "this event is complete" method, or would you expect visitors to terminate events in progress before starting the next or leaving the file?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would expect an explicit leave function if that is passed through to the visitor.

Comment thread system/jlib/jevent.cpp

//Read a strongly typed value from a buffered stream.
template<typename T>
static T readToken(IBufferedSerialInputStream& stream, T& token, size32_t& bytesRead)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these were member functions they would only need a single parameter - so calling code would be simpler/cleaner.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I prepared for multiple classes to support multiple versions, I needed to read the version token before creating the class to read everything else. If we limit it to one class for all versions, as you suggested elsewhere, I agree member methods would be cleaner.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend a single class - I can't think of any examples in the platform where we have used multiple classes for different versions like this.

Comment thread system/jlib/jevent.cpp
if (!fileIO)
throw makeStringExceptionV(-1, "file '%s' not opened for reading", file.queryFilename());
Owned<ISerialInputStream> baseStream = createSerialInputStream(fileIO);
if (!baseStream)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this can not return null.

Comment thread system/jlib/jevent.cpp
if (!baseStream)
throw makeStringExceptionV(-1, "file '%s' input stresm not created", file.queryFilename());
Owned<IBufferedSerialInputStream> bufferedStream = createBufferedInputStream(baseStream, 0x100000, false);
if (!bufferedStream)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, this can not return null

Comment thread system/jlib/jevent.hpp
EvAttrPath,
EvAttrConnectId,
EvAttrEnabled,
EvAttrSysFileSize,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: Avoid the Sys prefix - some attributes e.g. FileSize may well be provided for events/meta as well.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to differentiate the file being visited from all of the files referenced within that file. The prefix is also meant to distinguish auto-generated values from event data. Would you remove the prefix from all of the attributes, or just those that might be reused by an event?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would remove sys from all attributes. I might also implement the timestamp differently, but that is a separate discussion.

Comment thread system/jlib/jevent.hpp
virtual Continuation visitAttribute(EventAttr id, uint16_t value) = 0;
virtual Continuation visitAttribute(EventAttr id, uint32_t value) = 0;
virtual Continuation visitAttribute(EventAttr id, uint64_t value) = 0;
virtual void leaveFile(uint32_t bytesRead) = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I like the name leave - ideally the verbs should be paired e.g. begin/end enter/leave.
I could also be hyper-picky about visit as a prefix, but I don't care enough!

@ghalliday ghalliday requested a review from Copilot March 26, 2025 19:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (3)
  • system/jlib/jevent.cpp: Language not supported
  • system/jlib/jevent.hpp: Language not supported
  • testing/unittests/jlibtests2.cpp: Language not supported

Copy link
Copy Markdown
Author

@timothyklemm timothyklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with it merging now, with updates to follow. I'm close to having a first pass for evtool dump that depends on this, so merging will make that easier to deal with.

Comment thread system/jlib/jevent.cpp

//Read a strongly typed value from a buffered stream.
template<typename T>
static T readToken(IBufferedSerialInputStream& stream, T& token, size32_t& bytesRead)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I prepared for multiple classes to support multiple versions, I needed to read the version token before creating the class to read everything else. If we limit it to one class for all versions, as you suggested elsewhere, I agree member methods would be cleaner.

Comment thread system/jlib/jevent.cpp
for (;;)
{
// no more data means no more events
size32_t got = 0;
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simpler to exit the loop on EventNone if it was included. For the record, I probably would have passed it to the visitor, like I am passing EvAttrNone.

Comment thread system/jlib/jevent.cpp
readToken(stream, attr, bytesRead);
if (EvAttrNone == attr)
{
if (!finishAttribute(attr, visitor, mute))
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at this as an implied End Of Event notification, without adding a leaveEvent (which is what the name would have been, to match leaveFile). Would you want an explicit "this event is complete" method, or would you expect visitors to terminate events in progress before starting the next or leaving the file?

Comment thread system/jlib/jevent.hpp
EvAttrPath,
EvAttrConnectId,
EvAttrEnabled,
EvAttrSysFileSize,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to differentiate the file being visited from all of the files referenced within that file. The prefix is also meant to distinguish auto-generated values from event data. Would you remove the prefix from all of the attributes, or just those that might be reused by an event?

@ghalliday ghalliday merged commit b4a142c into hpcc-systems:master Mar 31, 2025
51 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants