Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ArcticDB reading streaming data #1647

Merged
merged 5 commits into from
Jun 28, 2024
Merged

Fix ArcticDB reading streaming data #1647

merged 5 commits into from
Jun 28, 2024

Conversation

poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented Jun 25, 2024

Reference Issues/PRs

Fixes https://github.com/man-group/arcticdb-man/issues/80

What does this implement or fix?

Allows ArcticDB to read incomplete segments written by arcticc tick collectors.

Any other comments?

arcticc has a different append incompletes logic to ArcticDB. It writes a dummy StreamDescriptor on the TimeseriesDescriptor, and relies entirely on the StreamDescriptor in the segment header. That approach is better than the existing ArcticDB approach, which writes the same StreamDescriptor twice (directly in the header, and in the TimeseriesDescriptor).

ArcticDB was reading this dummy stream descriptor and crashing.

Firstly, this PR changes readers to first check the StreamDescriptor on the header of incompletes, rather than using the one stamped on the TimeseriesDescriptor. It also adds a backwards compat test for the format.

Secondly, and this is not required for the PR to be logically correct, we change writers so that, like arcticc, they do not duplicate the StreamDescriptor. This is done in 99d0870.

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@poodlewars poodlewars added the bug Something isn't working label Jun 25, 2024
@poodlewars poodlewars marked this pull request as ready for review June 25, 2024 14:24
@poodlewars poodlewars reopened this Jun 28, 2024
return decode_timeseries_descriptor(hdr, data, begin, end, segment.descriptor());
}

std::optional<TimeseriesDescriptor> decode_timeseries_descriptor_for_incompletes(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use more abstract naming here as it's a bit weird to refer to incompletes. Like decode_timeseries_descriptor_using_header_stream_descriptor. I settled on the current naming as it makes the purpose of the weird special case clearer for people changing the codec layer (it's a good hint about what they might break).

@poodlewars poodlewars merged commit 8cf8279 into master Jun 28, 2024
124 of 125 checks passed
@poodlewars poodlewars deleted the man-read-streaming-2 branch June 28, 2024 15:58
poodlewars added a commit that referenced this pull request Jun 30, 2024
Please see the description on the `master` version of this change,
#1647

The implementation is significantly different to the master version,
because master is written on top of the new binary encoding work.

---------

Co-authored-by: Alex Seaton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants