Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datalake: Translated offset Range #23704

Closed

Conversation

jcipar
Copy link
Contributor

@jcipar jcipar commented Oct 9, 2024

This adds a new structure, translated_offset_range, to store a range of Kafka offsets translated into Parquet, as well as the paths to the resulting Parquet files. It modifies record_multiplexer to use this as the return type when consuming a log.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

jcipar added 5 commits October 9, 2024 09:31
Add an implementation of data_writer_factory for
batching_parquet_writer. Use this to test the data path from multiplexer
through to writing parquet files.
batching_parquet_writer catches different types of exceptions and
transforms them into data_writer_error error codes. This is a good place
to integrate some error logging.
The data_writer_factory::create method may need to open files or do
other things that may fail. Return a result type so we can correctly
indicate failure.
Previously, a failure to create a data writer was handled through a try/
catch. This changes that to a result type, since that's our preferred
error handling for the higher-level parts of the code. This requires
changing the type for the writer from std::unique_ptr to ss::shared_ptr
so it can be returned in a result (previously it was returned by
reference).
…urn it

This adds a new data structure, translated_offset_range, to store a the
range of Kafka offsets translated into Parquet, as well as the locations
for the resulting Parquet files. It modifies the record_multiplexer to
return this new type.
@jcipar jcipar force-pushed the jcipar/translated-offset-range branch from 81bdf10 to 7c93d14 Compare October 9, 2024 15:28
@jcipar jcipar marked this pull request as ready for review October 9, 2024 15:28
@jcipar
Copy link
Contributor Author

jcipar commented Oct 9, 2024

@rockwotj Only the top two commits to this are relevant for review, the rest are from the base branch.

@jcipar
Copy link
Contributor Author

jcipar commented Oct 9, 2024

Closing this and moving these changes to #23683

@jcipar jcipar closed this Oct 9, 2024
Copy link

mergify bot commented Oct 9, 2024

⚠️ The sha of the head commit of this PR conflicts with #23683. Mergify cannot evaluate rules on this PR. ⚠️

@vbotbuildovich
Copy link
Collaborator

the below tests from https://buildkite.com/redpanda/redpanda/builds/56130#019271e6-f6fa-4280-8dbe-1ea92049f1a5 have failed and will be retried

gtest_record_multiplexer_rpunit

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#56130

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/crl_test.py::CertificateRevocationTest.test_noncogent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants