iceberg: add batching parquet writer factory #23683

jcipar · 2024-10-08T19:33:03Z

Adds a factory class for the batching_parquet_writer so it can be created by the multiplexer. This also improves error handling and logging in the data path.

Backports Required

Release Notes

none

dotnwat · 2024-10-08T21:05:18Z

src/v/datalake/batching_parquet_writer.cc

+    } catch (const std::exception& e) {
+        datalake_log.error(
+          "Error making output stream for file {}", _output_file_path);
+        datalake_log.error(e.what());


} catch (...) { datalake_log.error("Error making output stream for file {}: {}", _output_file_path, std::current_exception()); }

dotnwat · 2024-10-08T21:08:11Z

src/v/datalake/batching_parquet_writer.cc

+    std::stringstream filename_stream;
+    filename_stream << file_uuid << ".parquet";


use fmt::format?

mmaslankaprv · 2024-10-09T07:53:29Z

src/v/datalake/batching_parquet_writer.cc

+        datalake_log.error("Error opening output file {}",_output_file_path);
+        datalake_log.error(e.what());


please keep this a single log line and wrap it with vlog macro

src/v/datalake/batching_parquet_writer.cc

src/v/datalake/record_multiplexer.cc

rockwotj · 2024-10-09T17:13:55Z

src/v/datalake/batching_parquet_writer.cc

@@ -159,4 +162,26 @@ ss::future<> batching_parquet_writer::abort() {
    }
 }

+batching_parquet_writer_factory::batching_parquet_writer_factory(
+  std::filesystem::path local_directory,


Architecturally, I believe it will be simpler to keep these parquet files in memory. AFAIK we're going to load these fully into memory already to send them, and it's fairly straightforward to have a semaphore to make sure we stay within the subsystem's memory budget. Writing to disk, especially with our trends of smaller and smaller disks, is going to come with a number of challenges: cleaning up zombie files, integrating with space management, etc.

Now we can do this after the beta phase, but we should keep this in mind as we're structuring the codepaths.

However

I agree that we do want to stream directly to s3 eventually. I'd rather stick with this for now so we can get an end-to-end test asap.

FWIW, we can stream from disk to S3, we don't need to load it all into memory first. cloud_storage::remote::upload_controller_snapshot is an example of that.

rockwotj · 2024-10-09T17:14:28Z

src/v/datalake/batching_parquet_writer.cc

+batching_parquet_writer_factory::create_writer(iceberg::struct_type schema) {
+    auto ret = std::make_unique<batching_parquet_writer>(
+      std::move(schema), _row_count_threshold, _byte_count_treshold);
+    std::string filename = fmt::format("{}.parquet", uuid_t::create());


Let's add some more info here like discussed in slack.

Just talked about this on slack again.

Currently adding a file_name_prefix to the factory

Later PR will add a remote_directory parameter for the uploader

PR that adds structured tables will prepend schema_id to the file name

rockwotj · 2024-10-09T17:15:04Z

src/v/datalake/batching_parquet_writer.cc

+        // FIXME: This method should return a result and let the multiplexer
+        // deal with it appropriately


We should probably have thrown at this level instead of the next layer up.

I'm not sure what you mean by "this level" you mean throw an exception up to the multiplexer?

rockwotj · 2024-10-09T17:16:32Z

src/v/datalake/record_multiplexer.cc

+            _writer_status = co_await writer.add_data_struct(
+              std::move(data), estimated_size);
+        } catch (const std::runtime_error& err) {
+            datalake_log.error("Failed to add data to writer");


vlog

Also we should set the _writer_status here.

nit: I think _writer_status should probably be initialized to a generic_error and be set to ok when it actually finished.

rockwotj · 2024-10-09T17:18:38Z

src/v/datalake/batching_parquet_writer.h

@@ -75,4 +76,20 @@ class batching_parquet_writer : public data_writer {
    data_writer_result _result;
 };

+class batching_parquet_writer_factory : public data_writer_factory {


optional: could have also made this a nested class so it would be batching_parquet_writer::factory

given the number of things stacked above this, and how pervasive a change that would be, I'd rather make it a separate PR.

jcipar · 2024-10-09T17:36:33Z

src/v/datalake/record_multiplexer.h

@@ -42,21 +56,25 @@ class record_multiplexer {
    explicit record_multiplexer(
      std::unique_ptr<data_writer_factory> writer_factory);
    ss::future<ss::stop_iteration> operator()(model::record_batch batch);
-    ss::future<result<chunked_vector<data_writer_result>, data_writer_error>>
+    // ss::future<result<chunked_vector<data_file_result>, data_writer_error>>


Just noticed that I left this in here. Removing it now.

src/v/datalake/batching_parquet_writer.cc

src/v/datalake/data_writer_interface.h

src/v/datalake/batching_parquet_writer.cc

vbotbuildovich · 2024-10-09T19:19:51Z

the below tests from https://buildkite.com/redpanda/redpanda/builds/56150#01927262-6c33-4348-8659-1d033fdeab71 have failed and will be retried

gtest_record_multiplexer_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/56172#019272db-7e8b-42ef-828c-da0c04579c2d have failed and will be retried

gtest_record_multiplexer_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/56183#0192739b-ea9b-4c42-9850-dc89d848a76d have failed and will be retried

gtest_record_multiplexer_rpunit

bharathv

none of the comments are blockers, probably good to address in a later change

bharathv · 2024-10-09T17:28:54Z

src/v/datalake/batching_parquet_writer.cc

@@ -46,12 +49,24 @@ batching_parquet_writer::initialize(std::filesystem::path output_file_path) {
          ss::open_flags::create | ss::open_flags::truncate
            | ss::open_flags::wo);
    } catch (...) {
+        vlogl(


nit: vlog(datalake_log.error... is the usual pattern, that doesn't work?

bharathv · 2024-10-09T19:34:30Z

src/v/datalake/record_multiplexer.cc

+            _writer_status = co_await writer.add_data_struct(
+              std::move(data), estimated_size);
+        } catch (const std::runtime_error& err) {
+            datalake_log.error("Failed to add data to writer");


nit: I think _writer_status should probably be initialized to a generic_error and be set to ok when it actually finished.

src/v/datalake/record_multiplexer.h

dotnwat

all my feedback was addressed. lgtm

vbotbuildovich · 2024-10-10T16:43:44Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56250#019276e2-1b8b-4037-94c4-236142433fc3
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56307#019278c1-9fb3-41c3-892f-f149a75063a6
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56307#019278c1-9fb7-4c0b-8679-2aa9f6879841
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56307#019278c1-9fbb-4ab1-b31e-be8320a7ed56

vbotbuildovich · 2024-10-10T16:44:07Z

Retry command for Build#56250

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_storage_timing_stress_test.py::CloudStorageTimingStressTest.test_cloud_storage@{"cleanup_policy":"delete"}

Add an implementation of data_writer_factory for batching_parquet_writer. Use this to test the data path from multiplexer through to writing parquet files.

batching_parquet_writer catches different types of exceptions and transforms them into data_writer_error error codes. This is a good place to integrate some error logging.

The data_writer_factory::create method may need to open files or do other things that may fail. Return a result type so we can correctly indicate failure.

Previously, a failure to create a data writer was handled through a try/ catch. This changes that to a result type, since that's our preferred error handling for the higher-level parts of the code. This requires changing the type for the writer from std::unique_ptr to ss::shared_ptr so it can be returned in a result (previously it was returned by reference).

When reading Parquet files, the Arrow library reads int32s from unaligned memory. After upgrading to Clang 18 we started getting warnings about this when testing locally and errors in CI. This is safe to suppress: the read code path is only used in tests.

vbotbuildovich · 2024-10-11T01:16:29Z

Retry command for Build#56307

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_storage_timing_stress_test.py::CloudStorageTimingStressTest.test_cloud_storage@{"cleanup_policy":"delete"}

rockwotj · 2024-10-11T03:59:50Z

/ci-repeat 1
tests/rptest/tests/
skip-redpanda-build
cloud_storage_timing_stress_test.py::CloudStorageTimingStressTest.test_cloud_storage@{"cleanup_policy":"delete"}

bharathv · 2024-10-11T03:56:24Z

src/v/datalake/batching_parquet_writer.cc

@@ -159,4 +161,29 @@ ss::future<> batching_parquet_writer::abort() {
    }
 }

+batching_parquet_writer_factory::batching_parquet_writer_factory(
+  std::filesystem::path local_directory,


local_directory

todo for future: As we chatted offline, this is internal to the writer, so it should self manage this path and its lifecycle

bharathv · 2024-10-11T03:57:47Z

src/v/datalake/record_multiplexer.cc

+              std::move(data), estimated_size);
+        } catch (const std::runtime_error& err) {
+            datalake_log.error("Failed to add data to writer");
+            _writer_status =data_writer_error::parquet_conversion_error;


nit: space after =

bharathv · 2024-10-11T03:59:24Z

src/v/datalake/data_writer_interface.h

@@ -10,6 +10,7 @@
 #pragma once

 #include "base/outcome.h"
+#include "coordinator/data_file.h"


datalake/coordinator/data_file.h

dotnwat · 2024-10-11T04:10:01Z

cloud_storage_timing_stress_test.py::CloudStorageTimingStressTest.test_cloud_storage@{"cleanup_policy":"delete"}

i think we can ignore this it's failing upstream frequently.

ivotron · 2024-10-11T17:51:31Z

force-merging given that https://buildkite.com/redpanda/redpanda/builds/56307#019278c1-9fc3-40ed-ad1a-1f9ab3b24793 only shows the known always-failing-in-dev test failure

jcipar requested a review from andrwng October 8, 2024 19:33

github-actions bot added the area/redpanda label Oct 8, 2024

jcipar requested review from dotnwat, bharathv and mmaslankaprv October 8, 2024 19:33

dotnwat reviewed Oct 8, 2024

View reviewed changes

mmaslankaprv reviewed Oct 9, 2024

View reviewed changes

jcipar force-pushed the jcipar/batching-writer-factory branch 2 times, most recently from 56d88d8 to 48eddb3 Compare October 9, 2024 13:48

jcipar marked this pull request as ready for review October 9, 2024 13:48

jcipar requested review from mmaslankaprv and dotnwat October 9, 2024 13:48

mmaslankaprv reviewed Oct 9, 2024

View reviewed changes

src/v/datalake/batching_parquet_writer.cc Show resolved Hide resolved

mmaslankaprv reviewed Oct 9, 2024

View reviewed changes

src/v/datalake/record_multiplexer.cc Outdated Show resolved Hide resolved

jcipar force-pushed the jcipar/batching-writer-factory branch from 48eddb3 to 79062fb Compare October 9, 2024 15:15

jcipar requested a review from mmaslankaprv October 9, 2024 15:41

jcipar mentioned this pull request Oct 9, 2024

Datalake: Translated offset Range #23704

Closed

7 tasks

rockwotj self-requested a review October 9, 2024 17:02

rockwotj reviewed Oct 9, 2024

View reviewed changes

jcipar commented Oct 9, 2024

View reviewed changes

jcipar force-pushed the jcipar/batching-writer-factory branch from 7c93d14 to ecdd60e Compare October 9, 2024 17:43

dotnwat reviewed Oct 9, 2024

View reviewed changes

src/v/datalake/batching_parquet_writer.cc Outdated Show resolved Hide resolved

src/v/datalake/data_writer_interface.h Outdated Show resolved Hide resolved

src/v/datalake/batching_parquet_writer.cc Outdated Show resolved Hide resolved

bharathv reviewed Oct 9, 2024

View reviewed changes

jcipar force-pushed the jcipar/batching-writer-factory branch from ecdd60e to c3818d1 Compare October 9, 2024 19:47

jcipar requested review from dotnwat, rockwotj and bharathv October 9, 2024 19:48

jcipar force-pushed the jcipar/batching-writer-factory branch from c3818d1 to 92ed6e9 Compare October 9, 2024 19:55

dotnwat changed the title ~~Jcipar/batching writer factory~~ iceberg: add batching parquet writer factory Oct 9, 2024

jcipar force-pushed the jcipar/batching-writer-factory branch from 92ed6e9 to 8aa3cbb Compare October 9, 2024 23:24

dotnwat reviewed Oct 9, 2024

View reviewed changes

jcipar added 7 commits October 10, 2024 17:03

datalake: batching_parquet_writer factory and tests

b66741a

Add an implementation of data_writer_factory for batching_parquet_writer. Use this to test the data path from multiplexer through to writing parquet files.

datalake: error logging in batching_parquet_writer

ff42c26

batching_parquet_writer catches different types of exceptions and transforms them into data_writer_error error codes. This is a good place to integrate some error logging.

datalake: use result type for data_writer_factory::create

a08a847

The data_writer_factory::create method may need to open files or do other things that may fail. Return a result type so we can correctly indicate failure.

datalake: use coordinator::data_file in parquet writer

bb40926

datalake: modify multiplexer to return translated_offset_range

91a3c70

jcipar force-pushed the jcipar/batching-writer-factory branch from b5cfb5c to 06e20c6 Compare October 10, 2024 21:46

rockwotj approved these changes Oct 11, 2024

View reviewed changes

bharathv approved these changes Oct 11, 2024

View reviewed changes

ivotron merged commit 7ff0704 into redpanda-data:dev Oct 11, 2024
13 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iceberg: add batching parquet writer factory #23683

iceberg: add batching parquet writer factory #23683

jcipar commented Oct 8, 2024

dotnwat Oct 8, 2024

mmaslankaprv Oct 9, 2024

dotnwat Oct 8, 2024

mmaslankaprv Oct 9, 2024

rockwotj Oct 9, 2024

jcipar Oct 9, 2024

rockwotj Oct 9, 2024

jcipar Oct 9, 2024

rockwotj Oct 9, 2024

jcipar Oct 9, 2024

rockwotj Oct 9, 2024

jcipar Oct 9, 2024

bharathv Oct 9, 2024

rockwotj Oct 9, 2024

jcipar Oct 9, 2024

jcipar Oct 9, 2024

vbotbuildovich commented Oct 9, 2024 •

edited

Loading

bharathv left a comment

bharathv Oct 9, 2024

bharathv Oct 9, 2024

dotnwat left a comment

vbotbuildovich commented Oct 10, 2024 •

edited

Loading

vbotbuildovich commented Oct 10, 2024

vbotbuildovich commented Oct 11, 2024

rockwotj commented Oct 11, 2024

bharathv Oct 11, 2024

bharathv Oct 11, 2024

bharathv Oct 11, 2024

dotnwat commented Oct 11, 2024

ivotron commented Oct 11, 2024

		std::stringstream filename_stream;
		filename_stream << file_uuid << ".parquet";

		datalake_log.error("Error opening output file {}",_output_file_path);
		datalake_log.error(e.what());

		// FIXME: This method should return a result and let the multiplexer
		// deal with it appropriately

iceberg: add batching parquet writer factory #23683

iceberg: add batching parquet writer factory #23683

Conversation

jcipar commented Oct 8, 2024

Backports Required

Release Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbotbuildovich commented Oct 9, 2024 • edited Loading

bharathv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnwat left a comment

Choose a reason for hiding this comment

vbotbuildovich commented Oct 10, 2024 • edited Loading

vbotbuildovich commented Oct 10, 2024

Retry command for Build#56250

vbotbuildovich commented Oct 11, 2024

Retry command for Build#56307

rockwotj commented Oct 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnwat commented Oct 11, 2024

ivotron commented Oct 11, 2024

vbotbuildovich commented Oct 9, 2024 •

edited

Loading

vbotbuildovich commented Oct 10, 2024 •

edited

Loading