Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: After enabling Kerberos for HDFS, the task application encounters errors when accessing logs on HDFS through DolphinScheduler. #243

Open
QILIANG678 opened this issue Feb 13, 2025 · 0 comments
Labels
question Further information is requested

Comments

@QILIANG678
Copy link

Contact Details

[email protected]

What would you like to ask or discuss?

HDFS is functioning normally, and all DataNodes are online, but the task application is still reporting errors:
2025-02-12 17:33:53,953 ERROR 6633 [delayed-queue-executor-3] [] : [c.o.c.a.service.impl.LogParserServiceImpl:212] parseError: java.lang.Exception: failed to read file: hdfs://nameservice1/flume/dolphinscheduler/2025-02-12/20250212/16628488504192_1-32-41.log, err: Could not obtain block: BP-1830315256-192.168.100.219-1733820781848:blk_1073745518_4694 file=/flume/dolphinscheduler/2025-02-12/20250212/16628488504192_1-32-41.log No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.100.220:1026,DS-282152c0-871e-4254-b69c-730c5f1761ec,DISK] DatanodeInfoWithStorage[192.168.100.221:1026,DS-063987aa-021a-4f31-9be2-a6a90191ea2e,DISK] DatanodeInfoWithStorage[192.168.100.219:1026,DS-0d5dc533-d6cc-4ede-b70e-dbfc7fb76771,DISK] Dead nodes: DatanodeInfoWithStorage[192.168.100.221:1026,DS-063987aa-021a-4f31-9be2-a6a90191ea2e,DISK] DatanodeInfoWithStorage[192.168.100.220:1026,DS-282152c0-871e-4254-b69c-730c5f1761ec,DISK] DatanodeInfoWithStorage[192.168.100.219:1026,DS-0d5dc533-d6cc-4ede-b70e-dbfc7fb76771,DISK] at com.oppo.cloud.application.util.HDFSUtil.readLines(HDFSUtil.java:135) at com.oppo.cloud.application.service.impl.LogParserServiceImpl$LogParser.extract(LogParserServiceImpl.java:384) at com.oppo.cloud.application.service.impl.LogParserServiceImpl.handle(LogParserServiceImpl.java:203) at com.oppo.cloud.application.task.DelayedTask.handleDelayTask(DelayedTask.java:120) at com.oppo.cloud.application.task.DelayedTask.lambda$run$1(DelayedTask.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Checked the HDFS DataNode logs and found :
2025-02-12 17:33:53,901 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to read expected SASL data transfer protection handshake from client at /192.168.100.203:56864. Perhaps the client is running a n older version of Hadoop which does not support SASL data transfer protection org.apache.hadoop.hdfs.protocol.datatransfer.sasl.InvalidMagicNumberException: Received 1c5182 instead of deadbeef from client. at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDa taTransferServer.java:374) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDat aTransferServer.java:308) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransf erServer.java:135) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:750)

HDFS is configured as...

dfs.data.transfer.protection
authentication


hadoop.rpc.protection
authentication

The Hadoop client version of the task application is consistent with the server version
Here is the application-hadoop.yml configuration:
hadoop:
namenodes:
- nameservices: nameservice1
namenodesAddr: [ "ddp1", "ddp2" ]
namenodes: [ "nn1", "nn2" ]
user: nn
password:
port: 8020
# scheduler platform hdfs log path keyword identification, used by task-application
matchPathKeys: [ "flume" ]
# kerberos
enableKerberos: true
# /etc/krb5.conf
krb5Conf: "/data/module/compass-v1.1.2/task-application/conf/krb5.conf"
# hdfs/@EXAMPLE.COM
principalPattern: "nn/
@HADOOP.COM"
# admin
loginUser: "nn/[email protected]"
# /var/kerberos/krb5kdc/admin.keytab
keytabPath: "/data/module/compass-v1.1.2/task-application/conf/nn.service.keytab"
Please help identify the cause of the issue.

@QILIANG678 QILIANG678 added the question Further information is requested label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant