Skip to content

SIGSEGV in JVM Runtime #30

@stevenybw

Description

@stevenybw

Configuration

  • Operating system: Ubuntu 16.04.6 LTS
  • Kernel: 4.4.0-135-generic
  • UCX: UCX Release v1.9.0 configured with ./contrib/configure-release --with-java
  • Java: Oracle JDK 11.0.8
  • Spark: Apache Spark 3.0.1

Spark launch commandline

spark-shell --master yarn --name ExploreSparkUCX --deploy-mode client --num-executors 32 --conf spark.dynamicAllocation.maxExecutors=32 --executor-cores 7 --executor-memory 22g --driver-memory 22g --conf spark.eventLog.enabled='true' --conf spark.eventLog.dir='/user/spark/applicationHistory' --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' --conf spark.driver.extraClassPath='~/Software/ucx-1.9.0-java/lib:~/Software/ucx-1.9.0-java/lib/jucx-1.9.0.jar:~/sparkucx/target/spark-ucx-1.0-for-spark-3.0.jar' --conf spark.executor.extraClassPath='~/Software/ucx-1.9.0-java/lib:~/Software/ucx-1.9.0-java/lib/jucx-1.9.0.jar:~/sparkucx/target/spark-ucx-1.0-for-spark-3.0.jar' --conf spark.shuffle.manager='org.apache.spark.shuffle.UcxShuffleManager' --conf spark.shuffle.sort.io.plugin.class='org.apache.spark.shuffle.compat.spark_3_0.UcxLocalDiskShuffleDataIO'

Scala application:

sc.textFile("Dataset/some-44gb-text-file").flatMap(_.split(' ')).map(x => (x, 1L)).reduceByKey(_+_, 224).count

Phenomena

Of the first stage, with total 448 tasks, 447 tasks have been finished. After that, the Java Runtime is terminated by SIGSEGV as follow:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f551e278b50, pid=3253764, tid=3257270
#
# JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.8+10) (build 11.0.8+10-LTS)
# Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.8+10-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xd06b50][thread 3254166 also had an error]
[thread 3254607 also had an error]
  ResolvedMethodTable::lookup(int, unsigned int, Method*)+0x30
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to ~/core.3253764)
#
# An error report file with more information is saved as:
# ~/hs_err_pid3253764.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

With the hs_err_pid3253764.log:

Current thread (0x00007f51f0095000):  JavaThread "task-result-getter-2" daemon [_thread_in_vm, id=3257270, stack(0x00007f51a5cfb000,0x00007f51a5dfc000)]

Stack: [0x00007f51a5cfb000,0x00007f51a5dfc000],  sp=0x00007f51a5df8fd0,  free space=1015k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xd06b50]  ResolvedMethodTable::lookup(int, unsigned int, Method*)+0x30
V  [libjvm.so+0x891c7d]  java_lang_invoke_ResolvedMethodName::find_resolved_method(methodHandle const&, Thread*)+0x1d
V  [libjvm.so+0xaacdec]  CallInfo::set_resolved_method_name(Thread*)+0x6c
V  [libjvm.so+0xbe4062]  MethodHandles::resolve_MemberName(Handle, Klass*, bool, Thread*)+0x802
V  [libjvm.so+0xbe41ea]  MHN_resolve_Mem+0x12a
J 772  java.lang.invoke.MethodHandleNatives.resolve(Ljava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (0 bytes) @ 0x00007f5502b961af [0x00007f5502b960c0+0x00000000000000ef]
J 9382 c1 java.lang.invoke.MemberName$Factory.resolve(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (157 bytes) @ 0x00007f54fbb5bab4 [0x00007f54fbb5b8c0+0x00000000000001f4]
J 16342 c1 java.lang.invoke.MemberName$Factory.resolveOrFail(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Ljava/lang/Class;)Ljava/lang/invoke/MemberName; [email protected] (53 bytes) @ 0x00007f54fc4efe9c [0x00007f54fc4efe20+0x000000000000007c]
J 2735 c1 java.lang.invoke.MethodHandles$Lookup.resolveOrFail(BLjava/lang/Class;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MemberName; [email protected] (48 bytes) @ 0x00007f54fbcffab4 [0x00007f54fbcff6e0+0x00000000000003d4]
J 2058 c1 java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite; [email protected] (168 bytes) @ 0x00007f54fbba916c [0x00007f54fbba82e0+0x0000000000000e8c]
J 2357 c1 java.lang.invoke.LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite; [email protected] (287 bytes) @ 0x00007f54fbc43314 [0x00007f54fbc41f60+0x00000000000013b4]
J 15481 c2 java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (20 bytes) @ 0x00007f550334bcd8 [0x00007f550334bca0+0x0000000000000038]
J 2356 c1 java.lang.invoke.DelegatingMethodHandle$Holder.delegate(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (23 bytes) @ 0x00007f54fbc41644 [0x00007f54fbc411e0+0x0000000000000464]
J 1876 c1 java.lang.invoke.BootstrapMethodInvoker.invoke(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object; [email protected] (688 bytes) @ 0x00007f54fbb3d58c [0x00007f54fbb3a020+0x000000000000356c]
J 1875 c1 java.lang.invoke.CallSite.makeSite(Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/invoke/CallSite; [email protected] (91 bytes) @ 0x00007f54fbb34244 [0x00007f54fbb341c0+0x0000000000000084]
J 1874 c1 java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (44 bytes) @ 0x00007f54fbb2fa2c [0x00007f54fbb2f9c0+0x000000000000006c]
J 1873 c1 java.lang.invoke.MethodHandleNatives.linkCallSite(Ljava/lang/Object;ILjava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (66 bytes) @ 0x00007f54fbb2f454 [0x00007f54fbb2f000+0x0000000000000454]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x889559]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3b9
V  [libjvm.so+0x888285]  JavaCalls::call_static(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x115
V  [libjvm.so+0xdd1009]  SystemDictionary::find_dynamic_call_site_invoker(Klass*, int, Handle, Symbol*, Symbol*, Handle*, Handle*, Thread*)+0x459
V  [libjvm.so+0xab507f]  LinkResolver::resolve_dynamic_call(CallInfo&, int, Handle, Symbol*, Symbol*, Klass*, Thread*)+0x4f
V  [libjvm.so+0xab5434]  LinkResolver::resolve_invokedynamic(CallInfo&, constantPoolHandle const&, int, Thread*)+0x2c4
V  [libjvm.so+0xab93d6]  LinkResolver::resolve_invoke(CallInfo&, Handle, constantPoolHandle const&, int, Bytecodes::Code, Thread*)+0x3c6
V  [libjvm.so+0x87f698]  InterpreterRuntime::resolve_invokedynamic(JavaThread*)+0x168
V  [libjvm.so+0x87f9dd]  InterpreterRuntime::resolve_from_cache(JavaThread*, Bytecodes::Code)+0x15d
j  org.apache.spark.scheduler.TaskSetManager.maybeFinishTaskSet()V+39
j  org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(JLorg/apache/spark/scheduler/DirectTaskResult;)V+341
J 19438 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(Lorg/apache/spark/scheduler/TaskResultGetter$$anon$3;Ljava/lang/Object;)V (810 bytes) @ 0x00007f54fdf1bd9c [0x00007f54fdf17820+0x000000000000457c]
J 19437 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3$$Lambda$2884.apply$mcV$sp()V (12 bytes) @ 0x00007f54fdf018c4 [0x00007f54fdf01840+0x0000000000000084]
J 18878 c2 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f550355e5dc [0x00007f550355e5a0+0x000000000000003c]
J 19257 c1 org.apache.spark.util.Utils$.logUncaughtExceptions(Lscala/Function0;)Ljava/lang/Object; (66 bytes) @ 0x00007f54fde87ab4 [0x00007f54fde879a0+0x0000000000000114]
J 19290 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.run()V (47 bytes) @ 0x00007f54fde961ac [0x00007f54fde95d40+0x000000000000046c]
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 [email protected]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 [email protected]
j  java.lang.Thread.run()V+11 [email protected]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x889559]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3b9
V  [libjvm.so+0x88750d]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x1ed
V  [libjvm.so+0x9335ec]  thread_entry(JavaThread*, Thread*)+0x6c
V  [libjvm.so+0xe0f0aa]  JavaThread::thread_main_inner()+0x1fa
V  [libjvm.so+0xe0f411]  JavaThread::run()+0x351
V  [libjvm.so+0xe0acaa]  Thread::call_run()+0x13a
V  [libjvm.so+0xc5293e]  thread_native_entry(Thread*)+0xee

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 772  java.lang.invoke.MethodHandleNatives.resolve(Ljava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (0 bytes) @ 0x00007f5502b96136 [0x00007f5502b960c0+0x0000000000000076]
J 9382 c1 java.lang.invoke.MemberName$Factory.resolve(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Z)Ljava/lang/invoke/MemberName; [email protected] (157 bytes) @ 0x00007f54fbb5bab4 [0x00007f54fbb5b8c0+0x00000000000001f4]
J 16342 c1 java.lang.invoke.MemberName$Factory.resolveOrFail(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Ljava/lang/Class;)Ljava/lang/invoke/MemberName; [email protected] (53 bytes) @ 0x00007f54fc4efe9c [0x00007f54fc4efe20+0x000000000000007c]
J 2735 c1 java.lang.invoke.MethodHandles$Lookup.resolveOrFail(BLjava/lang/Class;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MemberName; [email protected] (48 bytes) @ 0x00007f54fbcffab4 [0x00007f54fbcff6e0+0x00000000000003d4]
J 2058 c1 java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite()Ljava/lang/invoke/CallSite; [email protected] (168 bytes) @ 0x00007f54fbba916c [0x00007f54fbba82e0+0x0000000000000e8c]
J 2357 c1 java.lang.invoke.LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite; [email protected] (287 bytes) @ 0x00007f54fbc43314 [0x00007f54fbc41f60+0x00000000000013b4]
J 15481 c2 java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (20 bytes) @ 0x00007f550334bcd8 [0x00007f550334bca0+0x0000000000000038]
J 2356 c1 java.lang.invoke.DelegatingMethodHandle$Holder.delegate(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; [email protected] (23 bytes) @ 0x00007f54fbc41644 [0x00007f54fbc411e0+0x0000000000000464]
J 1876 c1 java.lang.invoke.BootstrapMethodInvoker.invoke(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object; [email protected] (688 bytes) @ 0x00007f54fbb3d58c [0x00007f54fbb3a020+0x000000000000356c]
J 1875 c1 java.lang.invoke.CallSite.makeSite(Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/invoke/CallSite; [email protected] (91 bytes) @ 0x00007f54fbb34244 [0x00007f54fbb341c0+0x0000000000000084]
J 1874 c1 java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (44 bytes) @ 0x00007f54fbb2fa2c [0x00007f54fbb2f9c0+0x000000000000006c]
J 1873 c1 java.lang.invoke.MethodHandleNatives.linkCallSite(Ljava/lang/Object;ILjava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/invoke/MemberName; [email protected] (66 bytes) @ 0x00007f54fbb2f454 [0x00007f54fbb2f000+0x0000000000000454]
v  ~StubRoutines::call_stub
j  org.apache.spark.scheduler.TaskSetManager.maybeFinishTaskSet()V+39
j  org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(JLorg/apache/spark/scheduler/DirectTaskResult;)V+341
J 19438 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(Lorg/apache/spark/scheduler/TaskResultGetter$$anon$3;Ljava/lang/Object;)V (810 bytes) @ 0x00007f54fdf1bd9c [0x00007f54fdf17820+0x000000000000457c]
J 19437 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3$$Lambda$2884.apply$mcV$sp()V (12 bytes) @ 0x00007f54fdf018c4 [0x00007f54fdf01840+0x0000000000000084]
J 18878 c2 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f550355e5dc [0x00007f550355e5a0+0x000000000000003c]
J 19257 c1 org.apache.spark.util.Utils$.logUncaughtExceptions(Lscala/Function0;)Ljava/lang/Object; (66 bytes) @ 0x00007f54fde87ab4 [0x00007f54fde879a0+0x0000000000000114]
J 19290 c1 org.apache.spark.scheduler.TaskResultGetter$$anon$3.run()V (47 bytes) @ 0x00007f54fde961ac [0x00007f54fde95d40+0x000000000000046c]
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 [email protected]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 [email protected]
j  java.lang.Thread.run()V+11 [email protected]
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000

Register to memory mapping:

RAX=0x00007f55184749b8 points into unknown readable memory: 70 64 06 f0 51 7f 00 00
RBX={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
RCX=0x7fa5f0a12000d16c is an unknown value
RDX=0x0000000000000005 is an unknown value
RSP=0x00007f51a5df8fd0 is pointing into the stack for thread: 0x00007f51f0095000
RBP=0x00007f51a5df9020 is pointing into the stack for thread: 0x00007f51f0095000
RSI=0x0000000000000005 is an unknown value
RDI=0x00007f5518474950 points into unknown readable memory: ef 03 00 00 00 00 00 00
R8 ={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
R9 =0x0000000000000005 is an unknown value
R10=0x0000000000000065 is an unknown value
R11=0x000001fd47c00cb2 is an unknown value
R12=0x0000000000000005 is an unknown value
R13=0x00000000606ce22e is an unknown value
R14={method} {0x00007f515c5e7840} 'get$Lambda' '(Lorg/apache/spark/scheduler/TaskSetManager;)Lscala/Function1;' in 'org/apache/spark/scheduler/TaskSetManager$$Lambda$2962'
R15=0x7fa5f0a12000d16c is an unknown value

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions