Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update the Ceph Hadoop plugin to Apache Hadoop/HDFS 2.7x #25

Open
wwang-pivotal opened this issue Mar 1, 2016 · 15 comments
Open

update the Ceph Hadoop plugin to Apache Hadoop/HDFS 2.7x #25

wwang-pivotal opened this issue Mar 1, 2016 · 15 comments
Assignees

Comments

@wwang-pivotal
Copy link

Hi guys
The Apache Hadoop, HDFS have update to 2.7.x. They change lots in configuration then broken the Ceph Hadoop plugin.
Could you update the Ceph Hadoop plugin rebase to the Apache Hadoop 2.7.x etc.

Thanks.

@dotnwat
Copy link
Contributor

dotnwat commented Mar 1, 2016

HI @wwang-pivotal I'll take a look at this this week. If the changes aren't major then it shouldn't take more than an a day or two. Patches welcome too :)

@dotnwat dotnwat self-assigned this Mar 1, 2016
@wormwang
Copy link

Have u look the issue?

@m0zes
Copy link

m0zes commented Apr 26, 2016

This is certainly one of the changes needed, and this is only to get it partially working with Hadoop 2.6.0. I still can't get it to run yarn jobs.

diff --git a/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java b/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java
index a27384f..6f0df53 100644
--- a/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java
+++ b/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java
@@ -78,6 +78,10 @@ public class CephFileSystem extends FileSystem {
   public CephFileSystem() {
   }

+  protected int getDefaultPort() {
+    return 6789;
+  }
+
   /**
    * Create an absolute path using the working directory.
    */

@dotnwat
Copy link
Contributor

dotnwat commented Apr 26, 2016

Thank @m0zes. I've dropped the ball on 2.7, but I have some updates pending for that. I've only heard of a few problems with 2.6, and in those cases there were some things that were not reproducible. It would be helpful to know what other problems you were seeing with 2.6.

@m0zes
Copy link

m0zes commented Apr 26, 2016

Just trying one of the examples here, although even "debug" logging doesn't seem give me any idea on what is actually wrong. I believe this is at the filesystem level, though.

# hadoop  jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 10 100
16/04/26 12:40:40 DEBUG util.Shell: setsid exited with exit code 0
Number of Maps  = 10
Samples per Map = 100
16/04/26 12:40:40 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate
of successful kerberos logins and latency (milliseconds)], about=, always=false, type=DEFAULT, sampleName=Ops)
16/04/26 12:40:40 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate
of failed kerberos logins and latency (milliseconds)], about=, always=false, type=DEFAULT, sampleName=Ops)
16/04/26 12:40:40 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroup
s], about=, always=false, type=DEFAULT, sampleName=Ops)
16/04/26 12:40:40 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
16/04/26 12:40:40 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
16/04/26 12:40:40 DEBUG security.Groups:  Creating new Groups object
16/04/26 12:40:40 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
16/04/26 12:40:40 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
16/04/26 12:40:40 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
16/04/26 12:40:40 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
16/04/26 12:40:40 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
16/04/26 12:40:40 DEBUG security.UserGroupInformation: hadoop login
16/04/26 12:40:40 DEBUG security.UserGroupInformation: hadoop login commit
16/04/26 12:40:40 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: mozes
16/04/26 12:40:40 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: mozes" with name mozes
16/04/26 12:40:40 DEBUG security.UserGroupInformation: User entry: "mozes"
16/04/26 12:40:40 DEBUG security.UserGroupInformation: UGI loginUser:mozes (auth:SIMPLE)
16/04/26 12:40:40 DEBUG core.Tracer: sampler.classes = ; loaded no samplers
16/04/26 12:40:40 TRACE core.TracerId: ProcessID(fmt=%{tname}/%{ip}): computed process ID of "FSClient/10.5.3.30"
16/04/26 12:40:40 TRACE core.TracerPool: TracerPool(Global): adding tracer Tracer(FSClient/10.5.3.30)
16/04/26 12:40:40 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers
16/04/26 12:40:40 TRACE core.Tracer: Created Tracer(FSClient/10.5.3.30) for FSClient
Loading libcephfs-jni from default path: /usr/lib/hadoop/lib/native
Loading libcephfs-jni: /usr/lib64/libcephfs_jni.so
Loading libcephfs-jni: /usr/lib/jni/libcephfs_jni.so
Loading libcephfs-jni: Success!
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
16/04/26 12:40:42 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
16/04/26 12:40:42 DEBUG mapreduce.Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
16/04/26 12:40:42 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
16/04/26 12:40:42 DEBUG service.AbstractService: Service: org.apache.hadoop.mapred.ResourceMgrDelegate entered state INITED
16/04/26 12:40:42 DEBUG service.AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
16/04/26 12:40:42 DEBUG azure.NativeAzureFileSystem: finalize() called.
16/04/26 12:40:42 DEBUG azure.NativeAzureFileSystem: finalize() called.
16/04/26 12:40:42 INFO client.RMProxy: Connecting to ResourceManager at gremlin00.beocat.ksu.edu/10.5.3.30:8032
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136)
16/04/26 12:40:42 DEBUG ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
16/04/26 12:40:42 DEBUG ipc.HadoopYarnProtoRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol
16/04/26 12:40:42 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@3c86c285
16/04/26 12:40:42 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@74107a99
16/04/26 12:40:42 DEBUG service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
16/04/26 12:40:42 DEBUG service.AbstractService: Service org.apache.hadoop.mapred.ResourceMgrDelegate is started
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
16/04/26 12:40:42 DEBUG mapreduce.Cluster: Picked org.apache.hadoop.mapred.YarnClientProtocolProvider as the ClientProtocolProvider
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Cluster.getFileSystem(Cluster.java:161)
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
16/04/26 12:40:42 DEBUG mapred.ResourceMgrDelegate: getStagingAreaDir: dir=/staging/mozes/.staging
16/04/26 12:40:42 TRACE ipc.ProtobufRpcEngine: 1: Call -> gremlin00.beocat.ksu.edu/10.5.3.30:8032: getNewApplication {}
16/04/26 12:40:42 DEBUG ipc.Client: The ping interval is 60000 ms.
16/04/26 12:40:42 DEBUG ipc.Client: Connecting to gremlin00.beocat.ksu.edu/10.5.3.30:8032
16/04/26 12:40:42 DEBUG ipc.Client: IPC Client (1597504843) connection to gremlin00.beocat.ksu.edu/10.5.3.30:8032 from mozes: starting, having connections 1
16/04/26 12:40:42 DEBUG ipc.Client: IPC Client (1597504843) connection to gremlin00.beocat.ksu.edu/10.5.3.30:8032 from mozes sending #0
16/04/26 12:40:42 DEBUG ipc.Client: IPC Client (1597504843) connection to gremlin00.beocat.ksu.edu/10.5.3.30:8032 from mozes got value #0
16/04/26 12:40:42 DEBUG ipc.ProtobufRpcEngine: Call: getNewApplication took 161ms
16/04/26 12:40:42 TRACE ipc.ProtobufRpcEngine: 1: Response <- gremlin00.beocat.ksu.edu/10.5.3.30:8032: getNewApplication {application_id { id: 12 cluster_timestamp: 1461615899163 } maximumCapability { memory: 8192 virtual_cores: 4 }}
16/04/26 12:40:42 DEBUG mapreduce.JobSubmitter: Configuring job job_1461615899163_0012 with /staging/mozes/.staging/job_1461615899163_0012 as the submit dir
16/04/26 12:40:42 DEBUG mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:[ceph://hobbit01:6789/]
16/04/26 12:40:42 DEBUG mapreduce.JobResourceUploader: default FileSystem: ceph://hobbit01:6789
16/04/26 12:40:42 DEBUG mapreduce.JobSubmitter: Creating splits at ceph://hobbit01:6789/staging/mozes/.staging/job_1461615899163_0012
16/04/26 12:40:42 DEBUG input.FileInputFormat: Time taken to get FileStatuses: 32
16/04/26 12:40:42 INFO input.FileInputFormat: Total input paths to process : 10
16/04/26 12:40:42 DEBUG input.FileInputFormat: Total # of splits generated by getSplits: 10, TimeTaken: 35
16/04/26 12:40:43 INFO mapreduce.JobSubmitter: Cleaning up the staging area /staging/mozes/.staging/job_1461615899163_0012
java.lang.NullPointerException
        at org.apache.hadoop.io.Text.encode(Text.java:450)
        at org.apache.hadoop.io.Text.encode(Text.java:431)
        at org.apache.hadoop.io.Text.writeString(Text.java:480)
        at org.apache.hadoop.mapreduce.split.JobSplit$SplitMetaInfo.write(JobSplit.java:125)
        at org.apache.hadoop.mapreduce.split.JobSplitWriter.writeJobSplitMetaInfo(JobSplitWriter.java:193)
        at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:81)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:311)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
        at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
        at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

@dotnwat
Copy link
Contributor

dotnwat commented Apr 26, 2016

Wow, nothing there looks suspicious at first glance. The usual suspect is a mismatch between our bindings and what Hadoop expects, which seems to diverge occasionally. What version of Ceph are you running?

@m0zes
Copy link

m0zes commented Apr 26, 2016

I built cephfs-hadoop with the 9.2.1 libcephfs jar, 9.2.1 libcephfs_jni, and hadoop 2.6.0-cdh5.7.0. On ubuntu trusty.

The cluster I'm connecting to is also 9.2.1.

@m0zes
Copy link

m0zes commented Apr 27, 2016

For the life of me I can't see anything wrong with my configuration, but perhaps there is something else wrong. I know I can list, add, delete, and move files with the hdfs dfs suite of tools. Here is my configuration for reference. https://gist.github.com/m0zes/e6eb5ca39153989f7a37947a469e0b98

@dbseraf
Copy link

dbseraf commented Oct 27, 2016

Has there been any progress on this lately? Anyone know whether ceph 10.2 works any better?

@wormwang
Copy link

Has there been any progress on this lately in 2017? Anyone know whether ceph 10.2 or 11.2 works any better?

@dotnwat
Copy link
Contributor

dotnwat commented Jan 23, 2017

There hasn't been much work on this. I don't have a lot of time to work on this in the short term, but would be happy to offer basic support. Have you tried deploying the bindings?

@zphj1987
Copy link

@m0zes
the same error with you paste,had you resolve it?

data:2 wanted=3
17/02/28 14:26:17 DEBUG mapreduce.JobSubmitter: Creating splits at ceph://10.168.10.1:6789/tmp/hadoop-yarn/staging/root/.staging/job_1488254605886_0020
17/02/28 14:26:17 DEBUG input.FileInputFormat: Time taken to get FileStatuses: 5
17/02/28 14:26:17 INFO input.FileInputFormat: Total input paths to process : 1
17/02/28 14:26:17 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1488254605886_0020
java.lang.NullPointerException
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:444)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:405)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
	at org.apache.hadoop.examples.Grep.run(Grep.java:78)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.examples.Grep.main(Grep.java:103)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
	at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

@m0zes
Copy link

m0zes commented Feb 28, 2017

No. I ended up creating individual pools for rbd for each hadoop node, no replication. Then I created 6 rbds for each hadoop node for parallelism. And I put hdfs on top of those rbds, with a forced 3x replication. Not an ideal setup, but I couldn't waste any more time going down the cephfs-hadoop route.

@zphj1987
Copy link

zphj1987 commented Feb 28, 2017 via email

@zphj1987
Copy link

zphj1987 commented Mar 1, 2017

@m0zes
and i down my hadoop version to 2.7.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants