pulsar lakehouse sink connector for GCP to load hudi table #23024
Unanswered
Pavan792reddy
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
hi Team,
i am trying to load the data from pulsar topic into hudi table in GCS using pulsar lake house sink connector , we have generated the NAR file and created the sink with required parameters .
Property file details
{
"tenant": "public",
"namespace": "default",
"name": "hudi-sink-test1",
"inputs": [
"hudi-pulsar-test"
],
"archive": "/home/pavankumar_reddy/pulsar-io-lakehouse-2.11.0-SNAPSHOT.nar",
"parallelism": 1,
"processingGuarantees": "EFFECTIVELY_ONCE",
"configs": {
"type": "hudi","hoodie.table.name": "hudi-connector-test","hoodie.table.type": "COPY_ON_WRITE",
"hoodie.base.path": "gs://test-hudi/path_to_hudi",
"hoodie.datasource.write.recordkey.field": "id","hoodie.datasource.write.partitionpath.field": "id", "fs.defaultFS": "gs://test-hudi/","fs.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem",
"fs.AbstractFileSystem.gs.impl": "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS",
"google.cloud.auth.service.account.enable": "true",
"google.cloud.auth.service.account.keyfile": "/home/pavankumar_reddy/key.json",
"fs.gs.project.id": "ID"
}
}
log:-
2024-07-11T09:56:59,639+0000 [lakehouse-io-1-1] ERROR org.apache.pulsar.ecosystem.io.lakehouse.sink.SinkWriter - process record failed.
org.apache.hudi.exception.HoodieIOException: Failed to get instance of org.apache.hadoop.fs.FileSystem
at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:111) ~[hudi-common-0.12.3.jar:0.12.3]
at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:102) ~[hudi-common-0.12.3.jar:0.12.3]
at org.apache.hudi.common.table.HoodieTableMetaClient.initTableAndGetMetaClient(HoodieTableMetaClient.java:455) ~[hudi-common-0.12.3.jar:0.12.3]
at org.apache.hudi.common.table.HoodieTableMetaClient$PropertyBuilder.initTable(HoodieTableMetaClient.java:1115) ~[hudi-common-0.12.3.jar:0.12.3]
at org.apache.pulsar.ecosystem.io.lakehouse.sink.hudi.HoodieWriterProvider.createTable(HoodieWriterProvider.java:57) ~[WJb1X-zE-2YfDhD1u_zlrA/:?]
at org.apache.pulsar.ecosystem.io.lakehouse.sink.hudi.HoodieWriterProvider.(HoodieWriterProvider.java:48) ~[WJb1X-zE-2YfDhD1u_zlrA/:?]
at org.apache.pulsar.ecosystem.io.lakehouse.sink.hudi.HoodieWriter.(HoodieWriter.java:49) ~[WJb1X-zE-2YfDhD1u_zlrA/:?]
at org.apache.pulsar.ecosystem.io.lakehouse.sink.LakehouseWriter.getWriter(LakehouseWriter.java:43) ~[WJb1X-zE-2YfDhD1u_zlrA/:?]
at org.apache.pulsar.ecosystem.io.lakehouse.sink.SinkWriter.getOrCreateWriter(SinkWriter.java:148) ~[WJb1X-zE-2YfDhD1u_zlrA/:?]
at org.apache.pulsar.ecosystem.io.lakehouse.sink.SinkWriter.run(SinkWriter.java:104) [WJb1X-zE-2YfDhD1u_zlrA/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.77.Final.jar:4.1.77.Final]
at java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "gs"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3353) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3373) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:125) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3424) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3392) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:485) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) ~[hadoop-common-3.2.4.jar:?]
at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:109) ~[hudi-common-0.12.3.jar:0.12.3]
... 13 more
2024-07-11T09:57:58,002+0000 [pulsar-timer-15-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [hudi-pulsar-test] [public/default/hudi-sink-test1] [ddf78] Prefetched messages: 0 --- Consume throughput received: 0.08 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0
root@pulsar-test-sink:/home/pavankumar_reddy/apache-pulsar-3.3.0/logs/functions/public/default/hudi-sink-test1#
pom.xml for with the jar
<jackson.version>2.13.2.1</jackson.version>
<lombok.version>1.18.22</lombok.version>
<pulsar.version>3.3.0.1</pulsar.version>
<log4j2.version>2.17.2</log4j2.version>
<slf4j.version>1.7.25</slf4j.version>
<hadoop.version>3.2.4</hadoop.version>
<iceberg.version>0.13.1</iceberg.version>
<parquet.version>1.12.0</parquet.version>
<hudi.version>0.12.3</hudi.version>
<delta.version>0.3.0</delta.version>
<parquet.avro.version>1.12.2</parquet.avro.version>
<netty.version>4.1.77.Final</netty.version>
<aws.sdk.version>1.12.220</aws.sdk.version>
<gcs.version>hadoop3-2.2.7</gcs.version>
<curator.version>2.12.0</curator.version>
<snappy.java.version>1.1.8.4</snappy.java.version>
Beta Was this translation helpful? Give feedback.
All reactions