-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Configuration
- Operating system: Ubuntu 16.04.6 LTS
- Kernel: 4.4.0-135-generic
- UCX: UCX Release v1.9.0 configured with
./contrib/configure-release --with-java
- Java: Oracle JDK 11.0.8
- Spark: Apache Spark 3.0.1
Spark launch commandline
spark-shell --master yarn --name ExploreSparkUCX --deploy-mode client --num-executors 32 --conf spark.dynamicAllocation.maxExecutors=32 --executor-cores 7 --executor-memory 22g --driver-memory 22g --conf spark.eventLog.enabled='true' --conf spark.eventLog.dir='/user/spark/applicationHistory' --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' --conf spark.driver.extraClassPath='~/Software/ucx-1.9.0-java/lib:~/Software/ucx-1.9.0-java/lib/jucx-1.9.0.jar:~/sparkucx/target/spark-ucx-1.0-for-spark-3.0.jar' --conf spark.executor.extraClassPath='~/Software/ucx-1.9.0-java/lib:~/Software/ucx-1.9.0-java/lib/jucx-1.9.0.jar:~/sparkucx/target/spark-ucx-1.0-for-spark-3.0.jar' --conf spark.shuffle.manager='org.apache.spark.shuffle.UcxShuffleManager' --conf spark.shuffle.sort.io.plugin.class='org.apache.spark.shuffle.compat.spark_3_0.UcxLocalDiskShuffleDataIO'
Scala application:
sc.textFile("Dataset/some-44gb-text-file").flatMap(_.split(' ')).map(x => (x, 1L)).reduceByKey(_+_, 224).count
Phenomena
Of the first stage, with total 448 tasks, 447 tasks have been finished. After that, the Java Runtime is terminated by SIGSEGV as follow:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f551e278b50, pid=3253764, tid=3257270
#
# JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.8+10) (build 11.0.8+10-LTS)
# Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.8+10-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xd06b50][thread 3254166 also had an error]
[thread 3254607 also had an error]
ResolvedMethodTable::lookup(int, unsigned int, Method*)+0x30
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to ~/core.3253764)
#
# An error report file with more information is saved as:
# ~/hs_err_pid3253764.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
With the hs_err_pid3253764.log:
Current thread (0x00007f51f0095000): JavaThread "task-result-getter-2" daemon [_thread_in_vm, id=3257270, stack(0x00007f51a5cfb000,0x00007f51a5dfc000)]
Stack: [0x00007f51a5cfb000,0x00007f51a5dfc000], sp=0x00007f51a5df8fd0, free space=1015k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xd06b50] ResolvedMethodTable::lookup(int, unsigned int, Method*)+0x30
V [libjvm.so+0x891c7d] java_lang_invoke_ResolvedMethodName::find_resolved_method(methodHandle const&, Thread*)+0x1d
V [libjvm.so+0xaacdec] CallInfo::set_resolved_method_name(Thread*)+0x6c
V [libjvm.so+0xbe4062] MethodHandles::resolve_MemberName(Handle, Klass*, bool, Thread*)+0x802
V [libjvm.so+0xbe41ea] MHN_resolve_Mem+0x12a
J 772 java.lang.invoke.MethodHandleNatives.resolve(Ljava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (0 bytes) @ 0x00007f5502b961af [0x00007f5502b960c0+0x00000000000000ef]
J 9382 c1 java.lang.invoke.MemberName$Factory.resolve(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (157 bytes) @ 0x00007f54fbb5bab4 [0x00007f54fbb5b8c0+0x00000000000001f4]
J 16342 c1 java.lang.invoke.MemberName$Factory.resolveOrFail(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Ljava/lang/Class;)Ljava/lang/invoke/MemberName; [email protected] (53 bytes) @ 0x00007f54fc4efe9c [0x00007f54fc4efe20+0x000000000000007c]
J 2735 c1 java.lang.invoke.MethodHandles$Lookup.resolveOrFail(BLjava/lang/Class;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MemberName; [email protected] (48 bytes) @ 0x00007f54fbcffab4 [0x00007f54fbcff6e0+0x00000000000003d4]
J 2058 c1 java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite; [email protected] (168 bytes) @ 0x00007f54fbba916c [0x00007f54fbba82e0+0x0000000000000e8c]
J 2357 c1 java.lang.invoke.LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite; [email protected] (287 bytes) @ 0x00007f54fbc43314 [0x00007f54fbc41f60+0x00000000000013b4]
J 15481 c2 java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (20 bytes) @ 0x00007f550334bcd8 [0x00007f550334bca0+0x0000000000000038]
J 2356 c1 java.lang.invoke.DelegatingMethodHandle$Holder.delegate(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (23 bytes) @ 0x00007f54fbc41644 [0x00007f54fbc411e0+0x0000000000000464]
J 1876 c1 java.lang.invoke.BootstrapMethodInvoker.invoke(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object; [email protected] (688 bytes) @ 0x00007f54fbb3d58c [0x00007f54fbb3a020+0x000000000000356c]
J 1875 c1 java.lang.invoke.CallSite.makeSite(Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/invoke/CallSite; [email protected] (91 bytes) @ 0x00007f54fbb34244 [0x00007f54fbb341c0+0x0000000000000084]
J 1874 c1 java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (44 bytes) @ 0x00007f54fbb2fa2c [0x00007f54fbb2f9c0+0x000000000000006c]
J 1873 c1 java.lang.invoke.MethodHandleNatives.linkCallSite(Ljava/lang/Object;ILjava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (66 bytes) @ 0x00007f54fbb2f454 [0x00007f54fbb2f000+0x0000000000000454]
v ~StubRoutines::call_stub
V [libjvm.so+0x889559] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3b9
V [libjvm.so+0x888285] JavaCalls::call_static(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x115
V [libjvm.so+0xdd1009] SystemDictionary::find_dynamic_call_site_invoker(Klass*, int, Handle, Symbol*, Symbol*, Handle*, Handle*, Thread*)+0x459
V [libjvm.so+0xab507f] LinkResolver::resolve_dynamic_call(CallInfo&, int, Handle, Symbol*, Symbol*, Klass*, Thread*)+0x4f
V [libjvm.so+0xab5434] LinkResolver::resolve_invokedynamic(CallInfo&, constantPoolHandle const&, int, Thread*)+0x2c4
V [libjvm.so+0xab93d6] LinkResolver::resolve_invoke(CallInfo&, Handle, constantPoolHandle const&, int, Bytecodes::Code, Thread*)+0x3c6
V [libjvm.so+0x87f698] InterpreterRuntime::resolve_invokedynamic(JavaThread*)+0x168
V [libjvm.so+0x87f9dd] InterpreterRuntime::resolve_from_cache(JavaThread*, Bytecodes::Code)+0x15d
j org.apache.spark.scheduler.TaskSetManager.maybeFinishTaskSet()V+39
j org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(JLorg/apache/spark/scheduler/DirectTaskResult;)V+341
J 19438 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(Lorg/apache/spark/scheduler/TaskResultGetter$$anon$3;Ljava/lang/Object;)V (810 bytes) @ 0x00007f54fdf1bd9c [0x00007f54fdf17820+0x000000000000457c]
J 19437 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3$$Lambda$2884.apply$mcV$sp()V (12 bytes) @ 0x00007f54fdf018c4 [0x00007f54fdf01840+0x0000000000000084]
J 18878 c2 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f550355e5dc [0x00007f550355e5a0+0x000000000000003c]
J 19257 c1 org.apache.spark.util.Utils$.logUncaughtExceptions(Lscala/Function0;)Ljava/lang/Object; (66 bytes) @ 0x00007f54fde87ab4 [0x00007f54fde879a0+0x0000000000000114]
J 19290 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.run()V (47 bytes) @ 0x00007f54fde961ac [0x00007f54fde95d40+0x000000000000046c]
j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 [email protected]
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 [email protected]
j java.lang.Thread.run()V+11 [email protected]
v ~StubRoutines::call_stub
V [libjvm.so+0x889559] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3b9
V [libjvm.so+0x88750d] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x1ed
V [libjvm.so+0x9335ec] thread_entry(JavaThread*, Thread*)+0x6c
V [libjvm.so+0xe0f0aa] JavaThread::thread_main_inner()+0x1fa
V [libjvm.so+0xe0f411] JavaThread::run()+0x351
V [libjvm.so+0xe0acaa] Thread::call_run()+0x13a
V [libjvm.so+0xc5293e] thread_native_entry(Thread*)+0xee
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 772 java.lang.invoke.MethodHandleNatives.resolve(Ljava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (0 bytes) @ 0x00007f5502b96136 [0x00007f5502b960c0+0x0000000000000076]
J 9382 c1 java.lang.invoke.MemberName$Factory.resolve(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (157 bytes) @ 0x00007f54fbb5bab4 [0x00007f54fbb5b8c0+0x00000000000001f4]
J 16342 c1 java.lang.invoke.MemberName$Factory.resolveOrFail(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Ljava/lang/Class;)Ljava/lang/invoke/MemberName; [email protected] (53 bytes) @ 0x00007f54fc4efe9c [0x00007f54fc4efe20+0x000000000000007c]
J 2735 c1 java.lang.invoke.MethodHandles$Lookup.resolveOrFail(BLjava/lang/Class;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MemberName; [email protected] (48 bytes) @ 0x00007f54fbcffab4 [0x00007f54fbcff6e0+0x00000000000003d4]
J 2058 c1 java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite; [email protected] (168 bytes) @ 0x00007f54fbba916c [0x00007f54fbba82e0+0x0000000000000e8c]
J 2357 c1 java.lang.invoke.LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite; [email protected] (287 bytes) @ 0x00007f54fbc43314 [0x00007f54fbc41f60+0x00000000000013b4]
J 15481 c2 java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (20 bytes) @ 0x00007f550334bcd8 [0x00007f550334bca0+0x0000000000000038]
J 2356 c1 java.lang.invoke.DelegatingMethodHandle$Holder.delegate(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (23 bytes) @ 0x00007f54fbc41644 [0x00007f54fbc411e0+0x0000000000000464]
J 1876 c1 java.lang.invoke.BootstrapMethodInvoker.invoke(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object; [email protected] (688 bytes) @ 0x00007f54fbb3d58c [0x00007f54fbb3a020+0x000000000000356c]
J 1875 c1 java.lang.invoke.CallSite.makeSite(Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/invoke/CallSite; [email protected] (91 bytes) @ 0x00007f54fbb34244 [0x00007f54fbb341c0+0x0000000000000084]
J 1874 c1 java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (44 bytes) @ 0x00007f54fbb2fa2c [0x00007f54fbb2f9c0+0x000000000000006c]
J 1873 c1 java.lang.invoke.MethodHandleNatives.linkCallSite(Ljava/lang/Object;ILjava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (66 bytes) @ 0x00007f54fbb2f454 [0x00007f54fbb2f000+0x0000000000000454]
v ~StubRoutines::call_stub
j org.apache.spark.scheduler.TaskSetManager.maybeFinishTaskSet()V+39
j org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(JLorg/apache/spark/scheduler/DirectTaskResult;)V+341
J 19438 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(Lorg/apache/spark/scheduler/TaskResultGetter$$anon$3;Ljava/lang/Object;)V (810 bytes) @ 0x00007f54fdf1bd9c [0x00007f54fdf17820+0x000000000000457c]
J 19437 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3$$Lambda$2884.apply$mcV$sp()V (12 bytes) @ 0x00007f54fdf018c4 [0x00007f54fdf01840+0x0000000000000084]
J 18878 c2 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f550355e5dc [0x00007f550355e5a0+0x000000000000003c]
J 19257 c1 org.apache.spark.util.Utils$.logUncaughtExceptions(Lscala/Function0;)Ljava/lang/Object; (66 bytes) @ 0x00007f54fde87ab4 [0x00007f54fde879a0+0x0000000000000114]
J 19290 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.run()V (47 bytes) @ 0x00007f54fde961ac [0x00007f54fde95d40+0x000000000000046c]
j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 [email protected]
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 [email protected]
j java.lang.Thread.run()V+11 [email protected]
v ~StubRoutines::call_stub
siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000
Register to memory mapping:
RAX=0x00007f55184749b8 points into unknown readable memory: 70 64 06 f0 51 7f 00 00
RBX={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
RCX=0x7fa5f0a12000d16c is an unknown value
RDX=0x0000000000000005 is an unknown value
RSP=0x00007f51a5df8fd0 is pointing into the stack for thread: 0x00007f51f0095000
RBP=0x00007f51a5df9020 is pointing into the stack for thread: 0x00007f51f0095000
RSI=0x0000000000000005 is an unknown value
RDI=0x00007f5518474950 points into unknown readable memory: ef 03 00 00 00 00 00 00
R8 ={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
R9 =0x0000000000000005 is an unknown value
R10=0x0000000000000065 is an unknown value
R11=0x000001fd47c00cb2 is an unknown value
R12=0x0000000000000005 is an unknown value
R13=0x00000000606ce22e is an unknown value
R14={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
R15=0x7fa5f0a12000d16c is an unknown value
Metadata
Metadata
Assignees
Labels
No labels