Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfx906 (AMD MI60) is failing on run_and_save_benchmarks.sh and llama.cpp #180

Open
Said-Akbar opened this issue Nov 22, 2024 · 48 comments
Open

Comments

@Said-Akbar
Copy link

Said-Akbar commented Nov 22, 2024

Hi @lamikr,

I built rocm_sdk_builder on a freshly installed Ubuntu 24.04.1. It took 5 hours, 120GB of storage and many hours of fixing small issues during building the repo (reference: #175).
Also, I chose gfx906 from ./babs.sh -c.

When I ran ./run_and_save_benchmarks.sh, I got this message.

./run_and_save_benchmarks.sh
Timestamp for benchmark results: 20241121_190404
Saving to file: 20241121_190404_cpu_vs_gpu_simple.txt
Benchmarking CPU and GPUs
Pytorch version: 2.4.1
ROCM HIP version: 6.1.40093-61a06a2f8
       Device:  AMD Ryzen 9 5950X 16-Core Processor
    'CPU time: 26.503 sec
       Device: AMD Radeon Graphics
    'GPU time: 0.399 sec
       Device: AMD Radeon Graphics
    'GPU time: 0.353 sec
Benchmark ready

Saving to file: 20241121_190404_pytorch_dot_products.txt
Pytorch version: 2.4.1
dot product calculation test
tensor([[[ 0.2042, -0.5683,  0.5711,  1.5666, -0.8859, -0.4255, -0.6103,
          -0.5932],
         [-0.1816, -1.0552,  0.3676,  2.1399, -0.8622,  0.1185, -0.4614,
          -0.4577],
         [ 0.2491, -0.5238,  0.5873,  1.5027, -0.8808, -0.4906, -0.6309,
          -0.6083]],

        [[-0.0812,  0.5027, -0.0134, -0.1771, -1.6389,  0.0154, -1.1964,
          -0.3948],
         [-0.3459, -0.4265,  0.0969,  0.0608, -0.9923, -0.4199, -0.7190,
          -0.0208],
         [-0.2615, -0.6958,  0.1066, -0.1948, -1.2152, -0.1223, -0.6278,
           0.1627]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
Device: AMD Radeon Graphics / cuda:0
    Default benchmark:
:0:/home/saidp/Downloads/rocm_sdk_builder/src_projects/clr/hipamd/src/hip_global.cpp:114 : 8471950880 us: [pid:454884 tid:0x7ad2a9db0b80] Cannot find Symbol with name: Cijk_Alik_Bljk_HHS_BH_MT128x64x16_SE_APM1_AF0EM2_AF1EM1_AMAS3_ASAE01_ASCE01_ASEM2_BL1_BS1_DTLA0_DTLB0_EPS1_FL1_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA1_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT8_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG16_16_1_WGM1

Note the error at the bottom 'Cannot find Symbol with name'. I thought this would not be an issue with llama.cpp.
However, I got a similar error in llama.cpp as well (I built it using ./babs.sh -b binfo/extra/ai_tools.blist).

source /opt/rocm_sdk_612/bin/env_rocm.sh
llama-server -m /media/saidp/datasets/text_generation/models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf -c 2048 -ngl 99 --metrics
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 9.0, VMM: no
build: 3901 (49f4671b) with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
system info: n_threads = 16, n_threads_batch = 16, total_threads = 32

system_info: n_threads = 16 (n_threads_batch = 16) / 32 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 

main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 31
main: loading model
llama_model_loader: loaded meta data with 38 key-value pairs and 339 tensors from /media/saidp/datasets/text_generation/models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-7...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-7B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 7
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - kv  34:                      quantize.imatrix.file str              = /models_out/Qwen2.5-7B-Instruct-GGUF/...
llama_model_loader: - kv  35:                   quantize.imatrix.dataset str              = /training_dir/calibration_datav3.txt
llama_model_loader: - kv  36:             quantize.imatrix.entries_count i32              = 196
llama_model_loader: - kv  37:              quantize.imatrix.chunks_count i32              = 128
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q8_0:  198 tensors
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 152064
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 3584
llm_load_print_meta: n_layer          = 28
llm_load_print_meta: n_head           = 28
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 18944
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 7.62 B
llm_load_print_meta: model size       = 7.54 GiB (8.50 BPW) 
llm_load_print_meta: general.name     = Qwen2.5 7B Instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size =    0.45 MiB
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  3542.78 MiB
llm_load_tensors:      ROCm1 buffer size =  3622.66 MiB
llm_load_tensors:        CPU buffer size =   552.23 MiB
......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =    60.00 MiB
llama_kv_cache_init:      ROCm1 KV buffer size =    52.00 MiB
llama_new_context_with_model: KV self size  =  112.00 MiB, K (f16):   56.00 MiB, V (f16):   56.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     1.16 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model:      ROCm0 compute buffer size =   184.01 MiB
llama_new_context_with_model:      ROCm1 compute buffer size =   348.02 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    23.02 MiB
llama_new_context_with_model: graph nodes  = 986
llama_new_context_with_model: graph splits = 3
llama_init_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
:0:/home/saidp/Downloads/rocm_sdk_builder/src_projects/clr/hipamd/src/hip_global.cpp:114 : 10662878012 us: [pid:465832 tid:0x7268ce2a2c40] Cannot find Symbol with name: Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
Aborted (core dumped)

llama.cpp is failing with a similar error. Note that this llama.cpp worked with the CPU when I do not set the ngl parameter (layer offloading). Please let me know if there is a fix.

@Said-Akbar
Copy link
Author

@lamikr ,
That error line comes from https://github.com/ROCm/clr/blob/rocm-6.1.x/hipamd/src/hip_global.cpp#L114 .

But I am not sure how to fix my issue above. Please, let me know if you have time to review this today.
Thanks!

@lamikr
Copy link
Owner

lamikr commented Nov 24, 2024

Hi, unfortunately I do not have myself the gfx906 for debug, so I only added added some patches that would be needed at least to get it build and start testing and added it's support as an experimental.

About your error, I have not never seen that kind of error, but it could be some kind of misconfiguration in rocBLAS related to src_projects/rocBLAS/library/src/blas3/Tensile/Logic/asm_full/vega10/vega10_Cijk_Alik_Bljk_HB_GB.yaml

But let's try to check first couple of basic issues step by step so I get basic info.

  1. Can you paste me first the output of rocminfo command? I am interested in whether it detects your gpu
    and what information it shows from it.

  2. Then are you able to build and run this test these test apps:

/opt/rocm_sdk_612/docs/examples/hipcc/hello_world
/opt/rocm_sdk_612/docs/examples/opencl/check_opencl_caps

@Said-Akbar
Copy link
Author

Hello @lamikr,
Sure, here is the output of rocminfo.

rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 5950X 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 5950X 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3400                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    98773496(0x5e329f8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    98773496(0x5e329f8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    98773496(0x5e329f8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx906                             
  Uuid:                    GPU-161620e172e17d3d               
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 26273(0x66a1)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1800                               
  BDFID:                   3328                               
  Internal Node ID:        1                                  
  Compute Unit:            64                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 471                                
  SDMA engine uCode::      145                                
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx906                             
  Uuid:                    GPU-915e294172fd62d2               
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 26273(0x66a1)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1800                               
  BDFID:                   4096                               
  Internal Node ID:        2                                  
  Compute Unit:            64                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 471                                
  SDMA engine uCode::      145                                
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done *** 

@Said-Akbar
Copy link
Author

tests:

cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world/
./build.sh 
rm -f ./hello_world
rm -f hello_world.o
rm -f /opt/rocm_sdk_612/src/*.o
/opt/rocm_sdk_612/bin/hipcc -g -fPIE   -c -o hello_world.o hello_world.cpp
/opt/rocm_sdk_612/bin/hipcc hello_world.o -fPIE -o hello_world
./hello_world
 System minor: 0
 System major: 9
 Agent name: AMD Radeon Graphics
Kernel input: GdkknVnqkc
Expecting that kernel increases each character from input string by one
Kernel output string: HelloWorld
Output string matched with HelloWorld
Test ok!

@Said-Akbar
Copy link
Author

Opencl test:

cd /opt/rocm_sdk_612/docs/examples/opencl/check_opencl_caps
make
/check_opencl_caps 
number of opencl platform devices: 1
==============================
Platform id: 0
AMD Accelerated Parallel Processing
Advanced Micro Devices, Inc.
OpenCL 2.1 AMD-APP (3614.0)
FULL_PROFILE
cl_khr_icd cl_amd_event_callback 
Number of devices found for platform: 2
    ---------------------------
    Device id: 0
    CL_DEVICE_VENDOR_ID: 0x1002
    CL_DEVICE_TYPE:  GPU
    CL_DEVICE_VENDOR_ID: 0x1002
    CL_DEVICE_MAX_COMPUTE_UNITS: 0x40
    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 0x3
    CL_DEVICE_MAX_WORK_GROUP_SIZE: 0x3
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 0x4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 0x2
    todo more information...
   ---------------------------
    ---------------------------
    Device id: 1
    CL_DEVICE_VENDOR_ID: 0x1002
    CL_DEVICE_TYPE:  GPU
    CL_DEVICE_VENDOR_ID: 0x1002
    CL_DEVICE_MAX_COMPUTE_UNITS: 0x40
    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 0x3
    CL_DEVICE_MAX_WORK_GROUP_SIZE: 0x3
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 0x4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 0x2
    todo more information...
   ---------------------------
==============================

@Said-Akbar
Copy link
Author

by the way, gfx906 has 'Vega 20' GPUs, but not 'Vega 10' GPUs. Not sure if some instruction that does not exist in gfx906 is being called from llama.cpp.

@Said-Akbar
Copy link
Author

Said-Akbar commented Nov 24, 2024

Here is the app crash log :

cat /var/crash/_opt_rocm_sdk_612_bin_llama-server.1000.crash
ApportVersion: 2.28.1-0ubuntu3.1
CasperMD5CheckResult: pass
Disassembly:
 => 0x7b73e609eb1c <__GI___pthread_kill+284>:	mov    %eax,%r14d
    0x7b73e609eb1f <__GI___pthread_kill+287>:	neg    %r14d
    0x7b73e609eb22 <__GI___pthread_kill+290>:	cmp    $0xfffff000,%eax
    0x7b73e609eb27 <__GI___pthread_kill+295>:	mov    $0x0,%eax
    0x7b73e609eb2c <__GI___pthread_kill+300>:	cmovbe %eax,%r14d
    0x7b73e609eb30 <__GI___pthread_kill+304>:	jmp    0x7b73e609eab0 <__GI___pthread_kill+176>
    0x7b73e609eb35 <__GI___pthread_kill+309>:	nopl   (%rax)
    0x7b73e609eb38 <__GI___pthread_kill+312>:	mov    %r13,%rdi
    0x7b73e609eb3b <__GI___pthread_kill+315>:	call   0x7b73e6098ed0 <__GI___lll_lock_wait_private>
    0x7b73e609eb40 <__GI___pthread_kill+320>:	jmp    0x7b73e609ea7e <__GI___pthread_kill+126>
    0x7b73e609eb45 <__GI___pthread_kill+325>:	nopl   (%rax)
    0x7b73e609eb48 <__GI___pthread_kill+328>:	mov    %r13,%rdi
    0x7b73e609eb4b <__GI___pthread_kill+331>:	call   0x7b73e6098f90 <__GI___lll_lock_wake_private>
    0x7b73e609eb50 <__GI___pthread_kill+336>:	jmp    0x7b73e609ea99 <__GI___pthread_kill+153>
    0x7b73e609eb55 <__GI___pthread_kill+341>:	call   0x7b73e6137e90 <__stack_chk_fail>
    0x7b73e609eb5a:	nopw   0x0(%rax,%rax,1)
InstallationDate: Installed on 2024-11-20 (4 days ago)
InstallationMedia: Ubuntu 24.04.1 LTS "Noble Numbat" - Release amd64 (20240827.1)
JournalErrors: -- No entries --
ProcCpuinfoMinimal:
 processor	: 31
 vendor_id	: AuthenticAMD
 cpu family	: 25
 model		: 33
 model name	: AMD Ryzen 9 5950X 16-Core Processor
 stepping	: 0
 microcode	: 0xa201016
 cpu MHz		: 2200.000
 cache size	: 512 KB
 physical id	: 0
 siblings	: 32
 core id		: 15
 cpu cores	: 16
 apicid		: 31
 initial apicid	: 31
 fpu		: yes
 fpu_exception	: yes
 cpuid level	: 16
 wp		: yes
 flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
 bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso
 bogomips	: 6799.86
 TLB size	: 2560 4K pages
 clflush size	: 64
 cache_alignment	: 64
 address sizes	: 48 bits physical, 48 bits virtual
 power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
ProcVersionSignature: Ubuntu 6.8.0-49.49-generic 6.8.12
Registers:
 rax            0x0                 0
 rbx            0x189e              6302
 rcx            0x7b73e609eb1c      135737710865180
 rdx            0x6                 6
 rsi            0x189e              6302
 rdi            0x189e              6302
 rbp            0x7ffe465f42b0      0x7ffe465f42b0
 rsp            0x7ffe465f4270      0x7ffe465f4270
 r8             0x57                87
 r9             0x0                 0
 r10            0x8                 8
 r11            0x246               582
 r12            0x6                 6
 r13            0x0                 0
 r14            0x16                22
 r15            0x627599932160      108257227252064
 rip            0x7b73e609eb1c      0x7b73e609eb1c <__GI___pthread_kill+284>
 eflags         0x246               [ PF ZF IF ]
 cs             0x33                51
 ss             0x2b                43
 ds             0x0                 0
 es             0x0                 0
 fs             0x0                 0
 gs             0x0                 0
 fs_base        0x7b73e628ec40      135737712897088
 gs_base        0x0                 0
Stacktrace:
 #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
         tid = <optimized out>
         ret = 0
         pd = <optimized out>
         old_mask = {__val = {0}}
         ret = <optimized out>
         pd = <optimized out>
         old_mask = <optimized out>
         ret = <optimized out>
         tid = <optimized out>
         ret = <optimized out>
         resultvar = <optimized out>
         resultvar = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
         __futex = <optimized out>
         resultvar = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
         __futex = <optimized out>
         __private = <optimized out>
         __oldval = <optimized out>
 #1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
 No locals.
 #2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
 No locals.
 #3  0x00007b73e604526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
         ret = <optimized out>
 #4  0x00007b73e60288ff in __GI_abort () at ./stdlib/abort.c:79
         save_stage = 1
         act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {108257077776080, 59, 14422071311227648676, 140730079068992, 135737623843607, 135737712327360, 18446744073709551512, 108257227250632, 303, 108257227129664, 303, 303, 2, 14, 6983489619661282816, 140730079069200}}, sa_flags = -1718412448, sa_restorer = 0x7ffe465f4410}
 #5  0x00007b73e0a2e0ff in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #6  0x00007b73e0b2b201 in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #7  0x00007b73e0ad5983 in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #8  0x00007b73e0c9f9ed in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #9  0x00007b73e0c799df in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #10 0x00007b73e204161e in Tensile::hip::SolutionAdapter::getKernel(ihipModuleSymbol_t*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #11 0x00007b73e2042257 in Tensile::hip::SolutionAdapter::launchKernel(Tensile::KernelInvocation const&, ihipStream_t*, ihipEvent_t*, ihipEvent_t*) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #12 0x00007b73e2042a68 in Tensile::hip::SolutionAdapter::launchKernels(std::vector<Tensile::KernelInvocation, std::allocator<Tensile::KernelInvocation> > const&, ihipStream_t*, ihipEvent_t*, ihipEvent_t*) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #13 0x00007b73e184224f in rocblas_status_ runContractionProblem<_Float16, _Float16, _Float16, _Float16, _Float16, _Float16>(RocblasContractionProblem<_Float16, _Float16, _Float16, _Float16, _Float16, _Float16> const&, rocblas_gemm_algo_, int) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #14 0x00007b73e1968b6c in rocblas_status_ gemm_ex_batched_template<_Float16, _Float16, _Float16>(_rocblas_handle*, rocblas_operation_, rocblas_operation_, int, int, int, _Float16 const*, _Float16 const* const*, long, int, long, _Float16 const* const*, long, int, long, _Float16 const*, _Float16 const* const*, long, int, long, _Float16* const*, long, int, long, int, rocblas_gemm_algo_, int, rocblas_gemm_flags_) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #15 0x00007b73e1966f0d in rocblas_status_ gemm_ex_typecasting<true, _Float16, _Float16, _Float16>(_rocblas_handle*, rocblas_operation_, rocblas_operation_, int, int, int, void const*, void const*, long, int, long, void const*, long, int, long, void const*, void const*, long, int, long, void*, long, int, long, int, rocblas_gemm_algo_, int, rocblas_gemm_flags_) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #16 0x00007b73e1960db2 in rocblas_status_ rocblas_gemm_ex_template<true>(_rocblas_handle*, rocblas_operation_, rocblas_operation_, int, int, int, void const*, void const*, rocblas_datatype_, long, int, long, void const*, rocblas_datatype_, long, int, long, void const*, void const*, rocblas_datatype_, long, int, long, void*, rocblas_datatype_, long, int, long, int, rocblas_datatype_, rocblas_gemm_algo_, int, unsigned int) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #17 0x00007b73e195ff43 in rocblas_gemm_batched_ex () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #18 0x00007b73e67b3e38 in hipblasGemmBatchedEx () from /opt/rocm_sdk_612/lib64/libhipblas.so.2
 No symbol table info available.
 #19 0x00007b73e6903068 in ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /opt/rocm_sdk_612/lib64/libggml.so
 No symbol table info available.
 #20 0x00007b73e68f33b9 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from /opt/rocm_sdk_612/lib64/libggml.so
 No symbol table info available.
 #21 0x00007b73e6873703 in ggml_backend_sched_graph_compute_async () from /opt/rocm_sdk_612/lib64/libggml.so
 No symbol table info available.
 #22 0x00007b73e8d6dfd2 in llama_decode () from /opt/rocm_sdk_612/lib64/libllama.so
 No symbol table info available.
 #23 0x0000627586b5d704 in llama_init_from_gpt_params(gpt_params&) ()
 No symbol table info available.
 #24 0x0000627586af0822 in server_context::load_model(gpt_params const&) ()
 No symbol table info available.
 #25 0x0000627586aa2820 in main ()
 No symbol table info available.
StacktraceAddressSignature: /opt/rocm_sdk_612/bin/llama-server:6:/usr/lib/x86_64-linux-gnu/libc.so.6+1d26e:/usr/lib/x86_64-linux-gnu/libc.so.6+8ff:/opt/rocm_sdk_612/lib64/libamdhip64.so.6.1.40093-61a06a2f8+b0ff:/opt/rocm_sdk_612/lib64/libamdhip64.so.6.1.40093-61a06a2f8+108201:/opt/rocm_sdk_612/lib64/libamdhip64.so.6.1.40093-61a06a2f8+b2983:/opt/rocm_sdk_612/lib64/libamdhip64.so.6.1.40093-61a06a2f8+27c9ed:/opt/rocm_sdk_612/lib64/libamdhip64.so.6.1.40093-61a06a2f8+2569df:/opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102+fae61e:/opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102+faf257:/opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102+fafa68:/opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102+7af24f:/opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102+8d5b6c:/opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102+8d3f0d:/opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102+8cddb2:/opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102+8ccf43
StacktraceTop:
 ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
Tags: noble wayland-session
ThreadStacktrace:
 .
 Thread 35 (Thread 0x7b71930006c0 (LWP 6336)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b7192fff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 56, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 56
         seq = 28
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727727773376, 1370407099822688401, 135727727773376, -160, 0, 135728109450704, 1370407099742996625, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 34 (Thread 0x7b71926006c0 (LWP 6337)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71925ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 58, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 58
         seq = 29
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727717287616, 1370405725433153681, 135727717287616, -160, 0, 135728109450704, 1370405725353461905, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 33 (Thread 0x7b7194e006c0 (LWP 6333)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b7194dff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 50, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 50
         seq = 25
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727759230656, 1370402426898270353, 135727759230656, -160, 0, 135728109450704, 1370402426818578577, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 32 (Thread 0x7b7196c006c0 (LWP 6330)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b7196bff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 44, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 44
         seq = 22
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727790687936, 1370397753973852305, 135727790687936, -160, 0, 135728109450704, 1370397753894160529, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 31 (Thread 0x7b71958006c0 (LWP 6332)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71957ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 48, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 48
         seq = 24
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727769716416, 1370403801287805073, 135727769716416, -160, 0, 135728109450704, 1370403801208113297, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 30 (Thread 0x7b71962006c0 (LWP 6331)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71961ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 46, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 46
         seq = 23
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727780202176, 1370396379584317585, 135727780202176, -160, 0, 135728109450704, 1370396379504625809, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 29 (Thread 0x7b7193a006c0 (LWP 6335)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71939ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 52, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 52
         seq = 26
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727738259136, 1370408474212223121, 135727738259136, -160, 0, 135728109450704, 1370408474132531345, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 28 (Thread 0x7b7191c006c0 (LWP 6338)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b7191bff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 60, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 60
         seq = 30
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727706801856, 1370413147136641169, 135727706801856, -160, 0, 135728109450704, 1370413147056949393, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 27 (Thread 0x7b71944006c0 (LWP 6334)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71943ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 54, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 54
         seq = 27
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727748744896, 1370401052508735633, 135727748744896, -160, 0, 135728109450704, 1370401052429043857, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 26 (Thread 0x7b71976006c0 (LWP 6329)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71975ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 42, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 42
         seq = 21
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727801173696, 1370399128363387025, 135727801173696, -160, 0, 135728109450704, 1370399128283695249, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 25 (Thread 0x7b7198a006c0 (LWP 6327)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71989ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 38, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 38
         seq = 19
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727822145216, 1370428265421523089, 135727822145216, -160, 0, 135728109450704, 1370428265341831313, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 24 (Thread 0x7b71980006c0 (LWP 6328)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b7197fff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 40, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 40
         seq = 20
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727811659456, 1370400502752921745, 135727811659456, -160, 0, 135728109450704, 1370400502673229969, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 23 (Thread 0x7b719e4006c0 (LWP 6318)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719e3ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 20, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 20
         seq = 10
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727916517056, 1370414246648268945, 135727916517056, -160, 0, 135728109450704, 1370414246568577169, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 22 (Thread 0x7b71a0c006c0 (LWP 6314)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71a0bff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 12, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 12
         seq = 6
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727958460096, 1370446132485474449, 135727958460096, -160, 0, 135728109450704, 1370446132405782673, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 21 (Thread 0x7b71994006c0 (LWP 6326)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71993ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 36, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 36
         seq = 18
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727832630976, 1370429639811057809, 135727832630976, -160, 0, 135728109450704, 1370429639731366033, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 20 (Thread 0x7b7199e006c0 (LWP 6325)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b7199dff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 34, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 34
         seq = 17
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727843116736, 1370431014200592529, 135727843116736, -160, 0, 135728109450704, 1370431014120900753, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 19 (Thread 0x7b71a20006c0 (LWP 6312)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71a1fff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 8, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 8
         seq = 4
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727979431616, 1370448881264543889, 135727979431616, -160, 0, 135728109450608, 1370448881184852113, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 18 (Thread 0x7b719b2006c0 (LWP 6323)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719b1ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 30, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 30
         seq = 15
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727864088256, 1370424966886639761, 135727864088256, -160, 0, 135728109450704, 1370424966806947985, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 17 (Thread 0x7b719a8006c0 (LWP 6324)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719a7ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 32, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 32
         seq = 16
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727853602496, 1370423592497105041, 135727853602496, -160, 0, 135728109450608, 1370423592417413265, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 16 (Thread 0x7b71a3e006c0 (LWP 6309)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71a3dff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 2, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 2
         seq = 1
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135728010888896, 1370444208340125841, 135728010888896, -160, 0, 135728109450608, 1370444208260434065, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 15 (Thread 0x7b719bc006c0 (LWP 6322)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719bbff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 28, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 28
         seq = 14
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727874574016, 1370426341276174481, 135727874574016, -160, 0, 135728109450704, 1370426341196482705, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 14 (Thread 0x7b719ee006c0 (LWP 6317)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719edff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 18, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 18
         seq = 9
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727927002816, 1370415621037803665, 135727927002816, -160, 0, 135728109450704, 1370415620958111889, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 13 (Thread 0x7b719c6006c0 (LWP 6321)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719c5ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 24, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 24
         seq = 12
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727885059776, 1370418919572686993, 135727885059776, -160, 0, 135728109450704, 1370418919492995217, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 12 (Thread 0x7b719da006c0 (LWP 6319)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719d9ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 22, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 22
         seq = 11
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727906031296, 1370421668351756433, 135727906031296, -160, 0, 135728109450704, 1370421668272064657, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 11 (Thread 0x7b719d0006c0 (LWP 6320)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719cfff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 26, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 26
         seq = 13
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727895545536, 1370420293962221713, 135727895545536, -160, 0, 135728109450704, 1370420293882529937, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 10 (Thread 0x7b719f8006c0 (LWP 6316)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b719f7ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 16, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 16
         seq = 8
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727937488576, 1370416995427338385, 135727937488576, -160, 0, 135728109450608, 1370416995347646609, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 9 (Thread 0x7b71a02006c0 (LWP 6315)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71a01ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 14, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 14
         seq = 7
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727947974336, 1370444758095939729, 135727947974336, -160, 0, 135728109450704, 1370444758016247953, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 8 (Thread 0x7b71a9c006c0 (LWP 6307)):
 #0  0x00007b73e612b83d in __libc_accept (fd=11, addr=..., len=0x0) at ../sysdeps/unix/sysv/linux/accept.c:26
         sc_ret = -512
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
 #1  0x0000627586aa98db in std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#3}> > >::_M_run() ()
 No symbol table info available.
 #2  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #3  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135728109455040, 1370465923694774417, 135728109455040, -160, 34, 140730079079232, 1370465923615082641, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #4  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 7 (Thread 0x7b71a16006c0 (LWP 6313)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71a15ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 10, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 10
         seq = 5
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727968945856, 1370447506875009169, 135727968945856, -160, 0, 135728109450704, 1370447506795317393, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 6 (Thread 0x7b71a2a006c0 (LWP 6311)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71a29ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 6, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 6
         seq = 3
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135727989917376, 1370441459561056401, 135727989917376, -160, 0, 135728109450704, 1370441459481364625, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 5 (Thread 0x7b71a34006c0 (LWP 6310)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71a33ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 4, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 4
         seq = 2
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135728000403136, 1370442833950591121, 135728000403136, -160, 0, 135728109450608, 1370442833870899345, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 4 (Thread 0x7b73c44006c0 (LWP 6303)):
 #0  __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
         args = {{gp_offset = 16, fp_offset = 0, overflow_arg_area = 0x7b73c43ff670, reg_save_area = 0x7b73c43ff630}}
         arg = <optimized out>
         r = -4
 #1  0x00007b73ce52dc30 in kmtIoctl () from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1
 No symbol table info available.
 #2  0x00007b73ce526ab8 in hsaKmtWaitOnMultipleEvents_Ext () from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1
 No symbol table info available.
 #3  0x00007b73ce497a89 in rocr::core::Signal::WaitAny(unsigned int, hsa_signal_s const*, hsa_signal_condition_t const*, long const*, unsigned long, hsa_wait_state_t, long*) () from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1
 No symbol table info available.
 #4  0x00007b73ce46cae6 in rocr::AMD::hsa_amd_signal_wait_any(unsigned int, hsa_signal_s*, hsa_signal_condition_t*, long*, unsigned long, hsa_wait_state_t, long*) () from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1
 No symbol table info available.
 #5  0x00007b73ce48efdf in rocr::core::Runtime::AsyncEventsLoop(void*) () from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1
 No symbol table info available.
 #6  0x00007b73ce42cf9b in rocr::os::ThreadTrampoline(void*) () from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135737143985856, 1369099230741448849, 0, -160, 0, 140730079067040, 1369099230661757073, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 3 (Thread 0x7b71a92006c0 (LWP 6308)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=31601, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=31601, abstime=0x0, clockid=0, expected=0, futex_word=0x7b71a4000be0) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7b71a4000be0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7b71a4000be8, cond=0x7b71a4000bb8) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b71a91ff8c0, __canceltype = 0, __prev = 0x0}
         cbuffer = {wseq = 0, cond = 0x7b71a4000bb8, mutex = 0x7b71a4000be8, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 0
         seq = 0
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x7b71a4000bb8, mutex=0x7b71a4000be8) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586ad07ca in std::thread::_State_impl<std::thread::_Invoker<std::tuple<httplib::ThreadPool::worker> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135728098969280, 1370464549305239697, 135728098969280, -160, 0, 135728109450608, 1370464549225547921, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 2 (Thread 0x7b72c26006c0 (LWP 6306)):
 #0  0x00007b73e6098d61 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x627586bfc938 <gpt_log_main()::log+88>) at ./nptl/futex-internal.c:57
         sc_cancel_oldtype = 0
         sc_ret = <optimized out>
         resultvar = <optimized out>
         __arg6 = <optimized out>
         __arg5 = <optimized out>
         __arg4 = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a6 = <optimized out>
         _a5 = <optimized out>
         _a4 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
 #1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x627586bfc938 <gpt_log_main()::log+88>) at ./nptl/futex-internal.c:87
         err = <optimized out>
         clockbit = 256
         op = 393
         err = <optimized out>
         clockbit = <optimized out>
         op = <optimized out>
 #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x627586bfc938 <gpt_log_main()::log+88>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
 No locals.
 #3  0x00007b73e609b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x627586bfc8e0 <gpt_log_main()::log>, cond=0x627586bfc910 <gpt_log_main()::log+48>) at ./nptl/pthread_cond_wait.c:503
         spin = 0
         buffer = {__routine = 0x7b73e609b4a0 <__condvar_cleanup_waiting>, __arg = 0x7b72c25ff8d0, __canceltype = -2001090544, __prev = 0x0}
         cbuffer = {wseq = 204, cond = 0x627586bfc910 <gpt_log_main()::log+48>, mutex = 0x627586bfc8e0 <gpt_log_main()::log>, private = 0}
         err = <optimized out>
         g = 0
         flags = <optimized out>
         g1_start = <optimized out>
         maxspin = 0
         signals = <optimized out>
         result = 0
         wseq = 204
         seq = 102
         private = 0
         maxspin = <optimized out>
         err = <optimized out>
         result = <optimized out>
         wseq = <optimized out>
         g = <optimized out>
         seq = <optimized out>
         flags = <optimized out>
         private = <optimized out>
         signals = <optimized out>
         done = <optimized out>
         g1_start = <optimized out>
         spin = <optimized out>
         buffer = <optimized out>
         cbuffer = <optimized out>
         s = <optimized out>
 #4  ___pthread_cond_wait (cond=0x627586bfc910 <gpt_log_main()::log+48>, mutex=0x627586bfc8e0 <gpt_log_main()::log>) at ./nptl/pthread_cond_wait.c:627
 No locals.
 #5  0x0000627586b7e623 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<gpt_log::resume()::{lambda()#1}> > >::_M_run() ()
 No symbol table info available.
 #6  0x00007b73e64ecdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
 No symbol table info available.
 #7  0x00007b73e609ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
         ret = <optimized out>
         pd = <optimized out>
         out = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {135732817561280, 1369666853619288209, 135732817561280, -160, 2, 140730079079088, 1369666853539596433, 1369164738774222993}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = <optimized out>
 #8  0x00007b73e6129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
 No locals.
 .
 Thread 1 (Thread 0x7b73e628ec40 (LWP 6302)):
 #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
         tid = <optimized out>
         ret = 0
         pd = <optimized out>
         old_mask = {__val = {0}}
         ret = <optimized out>
         pd = <optimized out>
         old_mask = <optimized out>
         ret = <optimized out>
         tid = <optimized out>
         ret = <optimized out>
         resultvar = <optimized out>
         resultvar = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
         __futex = <optimized out>
         resultvar = <optimized out>
         __arg3 = <optimized out>
         __arg2 = <optimized out>
         __arg1 = <optimized out>
         _a3 = <optimized out>
         _a2 = <optimized out>
         _a1 = <optimized out>
         __futex = <optimized out>
         __private = <optimized out>
         __oldval = <optimized out>
 #1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
 No locals.
 #2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
 No locals.
 #3  0x00007b73e604526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
         ret = <optimized out>
 #4  0x00007b73e60288ff in __GI_abort () at ./stdlib/abort.c:79
         save_stage = 1
         act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {108257077776080, 59, 14422071311227648676, 140730079068992, 135737623843607, 135737712327360, 18446744073709551512, 108257227250632, 303, 108257227129664, 303, 303, 2, 14, 6983489619661282816, 140730079069200}}, sa_flags = -1718412448, sa_restorer = 0x7ffe465f4410}
 #5  0x00007b73e0a2e0ff in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #6  0x00007b73e0b2b201 in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #7  0x00007b73e0ad5983 in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #8  0x00007b73e0c9f9ed in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #9  0x00007b73e0c799df in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
 No symbol table info available.
 #10 0x00007b73e204161e in Tensile::hip::SolutionAdapter::getKernel(ihipModuleSymbol_t*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #11 0x00007b73e2042257 in Tensile::hip::SolutionAdapter::launchKernel(Tensile::KernelInvocation const&, ihipStream_t*, ihipEvent_t*, ihipEvent_t*) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #12 0x00007b73e2042a68 in Tensile::hip::SolutionAdapter::launchKernels(std::vector<Tensile::KernelInvocation, std::allocator<Tensile::KernelInvocation> > const&, ihipStream_t*, ihipEvent_t*, ihipEvent_t*) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #13 0x00007b73e184224f in rocblas_status_ runContractionProblem<_Float16, _Float16, _Float16, _Float16, _Float16, _Float16>(RocblasContractionProblem<_Float16, _Float16, _Float16, _Float16, _Float16, _Float16> const&, rocblas_gemm_algo_, int) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #14 0x00007b73e1968b6c in rocblas_status_ gemm_ex_batched_template<_Float16, _Float16, _Float16>(_rocblas_handle*, rocblas_operation_, rocblas_operation_, int, int, int, _Float16 const*, _Float16 const* const*, long, int, long, _Float16 const* const*, long, int, long, _Float16 const*, _Float16 const* const*, long, int, long, _Float16* const*, long, int, long, int, rocblas_gemm_algo_, int, rocblas_gemm_flags_) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #15 0x00007b73e1966f0d in rocblas_status_ gemm_ex_typecasting<true, _Float16, _Float16, _Float16>(_rocblas_handle*, rocblas_operation_, rocblas_operation_, int, int, int, void const*, void const*, long, int, long, void const*, long, int, long, void const*, void const*, long, int, long, void*, long, int, long, int, rocblas_gemm_algo_, int, rocblas_gemm_flags_) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #16 0x00007b73e1960db2 in rocblas_status_ rocblas_gemm_ex_template<true>(_rocblas_handle*, rocblas_operation_, rocblas_operation_, int, int, int, void const*, void const*, rocblas_datatype_, long, int, long, void const*, rocblas_datatype_, long, int, long, void const*, void const*, rocblas_datatype_, long, int, long, void*, rocblas_datatype_, long, int, long, int, rocblas_datatype_, rocblas_gemm_algo_, int, unsigned int) () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #17 0x00007b73e195ff43 in rocblas_gemm_batched_ex () from /opt/rocm_sdk_612/lib64/librocblas.so.4
 No symbol table info available.
 #18 0x00007b73e67b3e38 in hipblasGemmBatchedEx () from /opt/rocm_sdk_612/lib64/libhipblas.so.2
 No symbol table info available.
 #19 0x00007b73e6903068 in ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /opt/rocm_sdk_612/lib64/libggml.so
 No symbol table info available.
 #20 0x00007b73e68f33b9 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from /opt/rocm_sdk_612/lib64/libggml.so
 No symbol table info available.
 #21 0x00007b73e6873703 in ggml_backend_sched_graph_compute_async () from /opt/rocm_sdk_612/lib64/libggml.so
 No symbol table info available.
 #22 0x00007b73e8d6dfd2 in llama_decode () from /opt/rocm_sdk_612/lib64/libllama.so
 No symbol table info available.
 #23 0x0000627586b5d704 in llama_init_from_gpt_params(gpt_params&) ()
 No symbol table info available.
 #24 0x0000627586af0822 in server_context::load_model(gpt_params const&) ()
 No symbol table info available.
 #25 0x0000627586aa2820 in main ()
 No symbol table info available.
Title: llama-server crashed with SIGABRT
UnreportableReason: This package does not seem to be installed correctly
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
separator: 

@Said-Akbar
Copy link
Author

Based on app crash logs, I see that rocm is not able to find the symbol table 'No symbol table info available.' Not sure what that means. Let me know. Thanks!

@lamikr
Copy link
Owner

lamikr commented Nov 24, 2024

Thanks, good to see that the the basic applications works. I will start my gfx906 build and try to check if I can figure out fix for those build errors with llama.cpp.

@Said-Akbar
Copy link
Author

Thank you! Looking forward to your updates.

@lamikr lamikr mentioned this issue Nov 25, 2024
@lamikr
Copy link
Owner

lamikr commented Nov 26, 2024

Hi, I added some more trace to clr component that is responsible for loading the so-files that can contain CO-data.
Also some other small changes related to vega. You should get the updated files and then start the build with commands:

git fetch
./babs.sh -up wip/rocm_sdk_builder_612_vega_testing
./babs.sh -b

After that then this command should printout much more debug to see what's going on:

AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 ROCM_SDK_PRINTOUT_DEBUG_MESSAGES=1 llama-server -m /opt/rocm_sdk_models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf -c 2048 -ngl 99 --metrics
I have tested that this commands works on gfx1030,gfx1010,gfx1102 and gfx1103.

And I did the gfx906 build and can find the string causing problem on these files

$cd /opt/rocm_sdk_612/lib64/rocblas/library
$ grep -R Cijk_Alik_Bljk_HB_GB
grep: TensileLibrary_Type_HH_Contraction_l_Alik_Bljk_Cijk_Dijk_fallback.dat: binary file matches
grep: TensileLibrary_Type_HH_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx906.dat: binary file matches
grep: TensileLibrary_Type_HH_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx906.co: binary file matches
grep: TensileLibrary_Type_HH_Contraction_l_Alik_Bljk_Cijk_Dijk_fallback_gfx906.hsaco: binary file matches

$ strings TensileLibrary_Type_HH_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx906.co | grep Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1

@Said-Akbar
Copy link
Author

@lamikr , thanks!
I will run above commands when I get back home.

Regarding new changes in the repo, will I have to build everything from scratch or only build only specific files in wip/rocm_sdk_builder_612_vega_testing ? I spent 3 days and over 10 hours building the last version of this repo. I hope this change will not require building everything from scratch.

Thanks!

@Said-Akbar
Copy link
Author

ok, this time it took 1 hour to build. I am still seeing the llama.cpp error. This time it has all the error logs as you explained above. Attaching the output error here.

AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 ROCM_SDK_PRINTOUT_DEBUG_MESSAGES=1 llama-server -m /media/saidp/datasets/text_generation/models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf -c 2048 -ngl 99 --metrics >>sdk_error_output.txt 2>&1

sdk_error_output.txt

@Said-Akbar
Copy link
Author

Here is the error logs from run_and_save_benchmarks.sh.

`
cd benchmarks/

AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 ROCM_SDK_PRINTOUT_DEBUG_MESSAGES=1 ./run_and_save_benchmarks.sh >>benchmark_error_output.txt 2>&1
`

benchmark_error_output.txt

@lamikr
Copy link
Owner

lamikr commented Nov 26, 2024

No need to re-build everyhing. If you have a working build and run the

"./babs.sh -up ", it will check

  • which projects have changes either in binfo file or in patches directory
  • check-out and-reapply patches for those changed projects
  • clean the build directory for those changed projects

So when you then run the ./babs.sh -b next time, it will only re-build and install the changed projects.

@Said-Akbar
Copy link
Author

@lamikr , yes, I used the commands you shared above.

git fetch
./babs.sh -up wip/rocm_sdk_builder_612_vega_testing
./babs.sh -b

It took 1 hour to compile. I am still seeing the same 'symbol not found' errors. Please, refer to my comments above for detailed error logs.

@lamikr
Copy link
Owner

lamikr commented Nov 26, 2024

In comparison, here is my log for successful lllama_coo launch with gfx1030 and same parameters. The output is pretty-similar except in the the very end. I was expecting to see some erros in your case on those clang or lld build commands that it executes to build the model, but even those looked pretty same.

`
INFO [ main] build info | tid="140451167387904" timestamp=1732592140 build=3407 commit="dab1e48c"
INFO [ main] system info | tid="140451167387904" timestamp=1732592140 n_threads=8 n_threads_batch=-1 total_threads=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | "
llama_model_loader: loaded meta data with 38 key-value pairs and 339 tensors from /opt/rocm_sdk_models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen2.5 7B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Qwen2.5
llama_model_loader: - kv 5: general.size_label str = 7B
llama_model_loader: - kv 6: general.license str = apache-2.0
llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-7...
llama_model_loader: - kv 8: general.base_model.count u32 = 1
llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 7B
llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-7B
llama_model_loader: - kv 12: general.tags arr[str,2] = ["chat", "text-generation"]
llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 14: qwen2.block_count u32 = 28
llama_model_loader: - kv 15: qwen2.context_length u32 = 32768
llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584
llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944
llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28
llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4
llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 22: general.file_type u32 = 7
llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645
llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
llama_model_loader: - kv 33: general.quantization_version u32 = 2
llama_model_loader: - kv 34: quantize.imatrix.file str = /models_out/Qwen2.5-7B-Instruct-GGUF/...
llama_model_loader: - kv 35: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt
llama_model_loader: - kv 36: quantize.imatrix.entries_count i32 = 196
llama_model_loader: - kv 37: quantize.imatrix.chunks_count i32 = 128
llama_model_loader: - type f32: 141 tensors
llama_model_loader: - type q8_0: 198 tensors
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = qwen2
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 152064
llm_load_print_meta: n_merges = 151387
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 3584
llm_load_print_meta: n_layer = 28
llm_load_print_meta: n_head = 28
llm_load_print_meta: n_head_kv = 4
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 7
llm_load_print_meta: n_embd_k_gqa = 512
llm_load_print_meta: n_embd_v_gqa = 512
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 18944
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 7.62 B
llm_load_print_meta: model size = 7.54 GiB (8.50 BPW)
llm_load_print_meta: general.name = Qwen2.5 7B Instruct
llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token = 151645 '<|im_end|>'
llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
llm_load_print_meta: LF token = 148848 'ÄĬ'
llm_load_print_meta: EOT token = 151645 '<|im_end|>'
llm_load_print_meta: max token length = 256
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 6800, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size = 0.30 MiB
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors: ROCm0 buffer size = 7165.44 MiB
llm_load_tensors: CPU buffer size = 552.23 MiB
amd_comgr_do_action:
ActionKind: AMD_COMGR_ACTION_ADD_PRECOMPILED_HEADERS
IsaName: amdgcn-amd-amdhsa--gfx1030
Options: "-O3" "-cl-kernel-arg-info" "-D__OPENCL_VERSION__=200" "-D__IMAGE_SUPPORT__=1" "-Xclang" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-mcode-object-version=5"
Path:
Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
Comgr Branch-Commit: HEAD-72e3209e9ecf
LLVM Commit: 72e3209e9ecf09af59f32bde15867048d6410e3b
ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
ActionKind: AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC
IsaName: amdgcn-amd-amdhsa--gfx1030
Options: "-O3" "-cl-kernel-arg-info" "-D__OPENCL_VERSION__=200" "-D__IMAGE_SUPPORT__=1" "-Xclang" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-mcode-object-version=5"
Path:
Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
Comgr Branch-Commit: HEAD-72e3209e9ecf
LLVM Commit: 72e3209e9ecf09af59f32bde15867048d6410e3b
Compilation Args: "-target" "amdgcn-amd-amdhsa" "-mcpu=gfx1030" "-I" "/tmp/comgr-9ef550/include" "-include-pch" "/tmp/comgr-9ef550/include/opencl1.2-c.pch" "-Xclang" "-fno-validate-pch" "-x" "cl" "-std=cl1.2" "-cl-no-stdinc" "-c" "-emit-llvm" "-O3" "-cl-kernel-arg-info" "-D__OPENCL_VERSION__=200" "-D__IMAGE_SUPPORT__=1" "-Xclang" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-mcode-object-version=5" "-nogpulib" "/tmp/comgr-9ef550/input/CompileSource" "-o" "/tmp/comgr-9ef550/output/CompileSource.bc"
Driver Job Args: clang "-cc1" "-mcode-object-version=5" "-mllvm" "--amdhsa-code-object-version=5" "-triple" "amdgcn-amd-amdhsa" "-emit-llvm-bc" "-emit-llvm-uselists" "-clear-ast-before-backend" "-main-file-name" "CompileSource" "-mrelocation-model" "pic" "-pic-level" "2" "-fhalf-no-semantic-interposition" "-mframe-pointer=none" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-fvisibility=hidden" "-fapply-global-visibility-to-externs" "-target-cpu" "gfx1030" "-debugger-tuning=gdb" "-resource-dir" "/opt/rocm_sdk_612/llvm/lib/clang/17" "-c-isystem" "/opt/rocm_sdk_612/llvm/include/gpu-none-llvm" "-include-pch" "/tmp/comgr-9ef550/include/opencl1.2-c.pch" "-I" "/tmp/comgr-9ef550/include" "-D" "OPENCL_VERSION=200" "-D" "IMAGE_SUPPORT=1" "-O3" "-std=cl1.2" "-fdebug-compilation-dir=/home/lamikr" "-ferror-limit" "19" "-cl-kernel-arg-info" "-nogpulib" "-fno-threadsafe-statics" "-vectorize-loops" "-vectorize-slp" "-fno-validate-pch" "-cl-ext=+cl_khr_fp64,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_gl_sharing,+cl_amd_device_attribute_query,+cl_amd_media_ops,+cl_amd_media_ops2,+cl_khr_image2d_from_buffer,+cl_khr_subgroups,+cl_amd_copy_buffer_p2p,+cl_amd_assembly_program" "-mllvm" "-amdgpu-prelink" "-faddrsig" "-o" "/tmp/comgr-9ef550/output/CompileSource.bc" "-x" "cl" "/tmp/comgr-9ef550/input/CompileSource"
ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
ActionKind: AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
IsaName: amdgcn-amd-amdhsa--gfx1030
Options: "code_object_v5"
Path:
Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
Comgr Branch-Commit: HEAD-72e3209e9ecf
LLVM Commit: 72e3209e9ecf09af59f32bde15867048d6410e3b
ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
ActionKind: AMD_COMGR_ACTION_LINK_BC_TO_BC
IsaName: amdgcn-amd-amdhsa--gfx1030
Options: "code_object_v5"
Path:
Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
Comgr Branch-Commit: HEAD-72e3209e9ecf
LLVM Commit: 72e3209e9ecf09af59f32bde15867048d6410e3b
Linking Bitcode: /tmp/comgr-bebd29/input/LLVM Binary
Linking Bitcode: /tmp/comgr-bebd29/input/opencl_lib.bc
Linking Bitcode: /tmp/comgr-bebd29/input/ocml_lib.bc
Linking Bitcode: /tmp/comgr-bebd29/input/ockl_lib.bc
Linking Bitcode: /tmp/comgr-bebd29/input/oclc_isa_version_1030.bc
Linking Bitcode: /tmp/comgr-bebd29/input/oclc_correctly_rounded_sqrt_off_lib.bc
Linking Bitcode: /tmp/comgr-bebd29/input/oclc_daz_opt_off_lib.bc
Linking Bitcode: /tmp/comgr-bebd29/input/oclc_finite_only_off_lib.bc
Linking Bitcode: /tmp/comgr-bebd29/input/oclc_unsafe_math_off_lib.bc
Linking Bitcode: /tmp/comgr-bebd29/input/oclc_wavefrontsize64_off_lib.bc
Linking Bitcode: /tmp/comgr-bebd29/input/oclc_abi_version_500_lib.bc
ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
ActionKind: AMD_COMGR_ACTION_CODEGEN_BC_TO_RELOCATABLE
IsaName: amdgcn-amd-amdhsa--gfx1030
Options: "-O3" "-cl-kernel-arg-info" "-mllvm" "-amdgpu-internalize-symbols" "-mcode-object-version=5"
Path:
Language: AMD_COMGR_LANGUAGE_NONE
Comgr Branch-Commit: HEAD-72e3209e9ecf
LLVM Commit: 72e3209e9ecf09af59f32bde15867048d6410e3b
Compilation Args: "-target" "amdgcn-amd-amdhsa" "-mcpu=gfx1030" "-c" "-mllvm" "-amdgpu-internalize-symbols" "-O3" "-cl-kernel-arg-info" "-mllvm" "-amdgpu-internalize-symbols" "-mcode-object-version=5" "-nogpulib" "/tmp/comgr-6e691f/input/linked.bc" "-o" "/tmp/comgr-6e691f/output/linked.bc.o"
Driver Job Args: clang "-cc1" "-mcode-object-version=5" "-mllvm" "--amdhsa-code-object-version=5" "-triple" "amdgcn-amd-amdhsa" "-emit-obj" "-clear-ast-before-backend" "-main-file-name" "linked.bc" "-mrelocation-model" "pic" "-pic-level" "2" "-fhalf-no-semantic-interposition" "-mframe-pointer=none" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-fvisibility=hidden" "-fapply-global-visibility-to-externs" "-target-cpu" "gfx1030" "-debugger-tuning=gdb" "-resource-dir" "/opt/rocm_sdk_612/llvm/lib/clang/17" "-O3" "-fdebug-compilation-dir=/home/lamikr" "-ferror-limit" "19" "-cl-kernel-arg-info" "-nogpulib" "-vectorize-loops" "-vectorize-slp" "-mllvm" "-amdgpu-internalize-symbols" "-mllvm" "-amdgpu-internalize-symbols" "-faddrsig" "-o" "/tmp/comgr-6e691f/output/linked.bc.o" "-x" "ir" "/tmp/comgr-6e691f/input/linked.bc"
ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
ActionKind: AMD_COMGR_ACTION_LINK_RELOCATABLE_TO_EXECUTABLE
IsaName: amdgcn-amd-amdhsa--gfx1030
Options:
Path:
Language: AMD_COMGR_LANGUAGE_NONE
Comgr Branch-Commit: HEAD-72e3209e9ecf
LLVM Commit: 72e3209e9ecf09af59f32bde15867048d6410e3b
Compilation Args: "-target" "amdgcn-amd-amdhsa" "-mcpu=gfx1030" "/tmp/comgr-3e9e36/input/linked.bc.o" "-o" "/tmp/comgr-3e9e36/output/a.so"
Driver Job Args: lld "/tmp/comgr-3e9e36/input/linked.bc.o" "-plugin-opt=mcpu=gfx1030" "--no-undefined" "-shared" "-o" "/tmp/comgr-3e9e36/output/a.so"
.......................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: ROCm0 KV buffer size = 112.00 MiB
llama_new_context_with_model: KV self size = 112.00 MiB, K (f16): 56.00 MiB, V (f16): 56.00 MiB
llama_new_context_with_model: ROCm_Host output buffer size = 1.16 MiB
llama_new_context_with_model: ROCm0 compute buffer size = 304.00 MiB
llama_new_context_with_model: ROCm_Host compute buffer size = 11.01 MiB
llama_new_context_with_model: graph nodes = 986
llama_new_context_with_model: graph splits = 2
lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsparse.so.1.0.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/libggml.so
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocblas.so.4.1.60102
hip_fatbin.cpp: Found CO for device amdgcn-amd-amdhsa--gfx1030, file: /opt/rocm_sdk_612/lib64/librocsolver.so.0.1.60102
INFO [ init] initializing slots | tid="140451167387904" timestamp=1732592145 n_slots=1
INFO [ init] new slot | tid="140451167387904" timestamp=1732592145 id_slot=0 n_ctx_slot=2048
INFO [ main] model loaded | tid="140451167387904" timestamp=1732592145
INFO [ main] chat template | tid="140451167387904" timestamp=1732592145 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
INFO [ main] HTTP server listening | tid="140451167387904" timestamp=1732592145 n_threads_http="15" port="8080" hostname="127.0.0.1"
INFO [ update_slots] all slots are idle | tid="140451167387904" timestamp=1732592145
INFO [ update_slots] all slots are idle | tid="140451167387904" timestamp=1732592162
ReturnStatus: AMD_COMGR_STATUS_SUCCESS

`

@lamikr
Copy link
Owner

lamikr commented Nov 26, 2024

How about those grep commands on /opt/rocm_sdk_612/lib64/rocblas/library
Do you have same files matched when searching this
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1?

One possibility could be that this is some kind of xnack+/xnack- type error. I have not had needed to debug that kind of problem by myself but basically the gpu can run code in xnack- mode and then something else is build on xnack+ mode, then those are not compatible. I need to investigate this more.

@Said-Akbar
Copy link
Author

sure, here is the string matches:

strings TensileLibrary_Type_HH_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx906.co | grep Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1
/Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
2Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1.kd
/Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM8
2Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM8.kd
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1.kd
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM8
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM8.kd
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1_preloaded
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM8_preloaded
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1.kd
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM8
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM8.kd

and llama.cpp symbol matches:

strings TensileLibrary_Type_HH_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx906.co | grep Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
/Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
2Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1.kd
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1.kd
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1_preloaded
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1.kd

@Said-Akbar
Copy link
Author

Said-Akbar commented Nov 26, 2024

Is there a way to switch to xnack+ mode in rocm/amd MI60 GPUs? If not, then I will wait for your update. Thanks!

EDIT:
Based on this page (https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html), enabling xnack should be possible (HSA_XNACK=1) but did not work for me.
Interesting fact from that page is 'Compiled kernels will run regardless if XNACK is enabled or is disabled. '
e.g. hipcc --offload-arch=gfx906 will run regardless if XNACK+ or XNACK-.

@lamikr
Copy link
Owner

lamikr commented Nov 26, 2024

Sorry, as I do not remember seeing this kind of error my self, this is little bit quess work for now to try to isolate the problem. So disabling comgr and hiprt would be again something to try next. Got idea from:
ROCm/MIOpen#2851

So, if you have time, can you try to build MIOpen with following options to see if anything changes.

-DMIOPEN_USE_COMGR=Off -DMIOPEN_USE_HIPRTC=Off

It can be done by opening
binfo/core/034_miopen.binfo
and adding following line
BINFO_APP_CMAKE_CFG="${BINFO_APP_CMAKE_CFG} -DMIOPEN_USE_COMGR=OFF -DMIOPEN_USE_HIPRTC=OFF"
for example after the
BINFO_APP_CMAKE_CFG="${BINFO_APP_CMAKE_CFG} -DCMAKE_INSTALL_LIBDIR=lib64"

And then rebuilding MIOpen

./babs.sh --clean binfo/core/034_miopen.binfo
./babs.sh -b

@Said-Akbar
Copy link
Author

Thanks! I will try it today when I get back home.

@Said-Akbar
Copy link
Author

Said-Akbar commented Nov 26, 2024

ok, I was impatient and tried buidling MIOpen with above changes.

install ok: MIOpen

/home/saidp/Downloads/rocm_sdk_builder/builddir/034_miopen
[77] Post-installing: MIOpen
post-install
MIOpen, post install command 0
unset CXX
post-install cmd ok: MIOpen
post-install ok: MIOpen
LIST_BINFO_FILE_FULLNAME[78]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo

---------------------------
[78] BINFO_APP_NAME: AMDMIGraphX
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: AMDMIGraphX
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/AMDMIGraphX
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/AMDMIGraphX
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/035_AMDMIGraphX
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/035_AMDMIGraphX/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[79]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/036_rocWMMA.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/036_rocWMMA.binfo

---------------------------
[79] BINFO_APP_NAME: rocWMMA
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/036_rocWMMA.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: rocWMMA
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/rocWMMA
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/rocWMMA
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/036_rocWMMA
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/036_rocWMMA/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[80]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/037_magma.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/037_magma.binfo

---------------------------
[80] BINFO_APP_NAME: magma
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/037_magma.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: magma
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/magma
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/magma
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/037_magma
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/037_magma/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[81]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/038_aotriton.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/038_aotriton.binfo

---------------------------
[81] BINFO_APP_NAME: aotriton
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/038_aotriton.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: aotriton
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/aotriton
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/aotriton
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/038_aotriton
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/038_aotriton/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[82]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_01_pytorch_dependencies.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_01_pytorch_dependencies.binfo

---------------------------
[82] BINFO_APP_NAME: pytorch_dependencies
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_01_pytorch_dependencies.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: pytorch_dependencies
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/pytorch_dependencies
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/pytorch_dependencies
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_01_pytorch_dependencies
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_01_pytorch_dependencies/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[83]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_02_pytorch.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_02_pytorch.binfo

---------------------------
[83] BINFO_APP_NAME: pytorch
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_02_pytorch.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: pytorch
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/pytorch
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/pytorch
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_02_pytorch
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_02_pytorch/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[84]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_03_pytorch_vision.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_03_pytorch_vision.binfo

---------------------------
[84] BINFO_APP_NAME: pytorch_vision
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_03_pytorch_vision.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: pytorch_vision
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/pytorch_vision
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/pytorch_vision
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_03_pytorch_vision
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_03_pytorch_vision/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[85]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_04_pytorch_audio.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_04_pytorch_audio.binfo

---------------------------
[85] BINFO_APP_NAME: pytorch_audio
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_04_pytorch_audio.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: pytorch_audio
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/pytorch_audio
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/pytorch_audio
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_04_pytorch_audio
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_04_pytorch_audio/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[86]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_05_torch_migraphx.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_05_torch_migraphx.binfo

---------------------------
[86] BINFO_APP_NAME: torch_migraphx
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_05_torch_migraphx.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: torch_migraphx
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/torch_migraphx
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/torch_migraphx
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_05_torch_migraphx
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_05_torch_migraphx/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[87]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_06_pytorch_bitsandbytes.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_06_pytorch_bitsandbytes.binfo

---------------------------
[87] BINFO_APP_NAME: bitsandbytes
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_06_pytorch_bitsandbytes.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: bitsandbytes
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/bitsandbytes
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/bitsandbytes
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_06_pytorch_bitsandbytes
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_06_pytorch_bitsandbytes/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[88]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_07_triton.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_07_triton.binfo

---------------------------
[88] BINFO_APP_NAME: triton
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/039_07_triton.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: triton
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/triton
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/triton
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_07_triton
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/039_07_triton/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[89]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/040_01_onnxruntime_rocm_training.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/040_01_onnxruntime_rocm_training.binfo

---------------------------
[89] BINFO_APP_NAME: onnxruntime
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/040_01_onnxruntime_rocm_training.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: onnxruntime
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/onnxruntime
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/onnxruntime
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/040_01_onnxruntime_rocm_training
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/040_01_onnxruntime_rocm_training/.result_install
---------------------------

LIST_BINFO_FILE_FULLNAME[90]: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/040_02_onnxruntime_deepspeed.binfo
APP_INFO_FULL_NAME: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/040_02_onnxruntime_deepspeed.binfo

---------------------------
[90] BINFO_APP_NAME: DeepSpeed
BINFO FILE: /home/saidp/Downloads/rocm_sdk_builder/binfo/core/040_02_onnxruntime_deepspeed.binfo
BINFO_APP_SRC_SUBDIR_BASENAME: 
BINFO_APP_SRC_TOPDIR_BASENAME: DeepSpeed
BINFO_APP_SRC_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/DeepSpeed
BINFO_APP_SRC_CLONE_DIR: /home/saidp/Downloads/rocm_sdk_builder/src_projects/DeepSpeed
BINFO_APP_BUILD_DIR: /home/saidp/Downloads/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/saidp/Downloads/rocm_sdk_builder/builddir/040_02_onnxruntime_deepspeed/.result_install
---------------------------


ROCM SDK build and install ready
You can use following commands to check your GPU

    source /opt/rocm_sdk_612/bin/env_rocm.sh
    rocminfo

It was built without any errors.
then, I faced the same symbol error again.

source /opt/rocm_sdk_612/bin/env_rocm.sh
llama-server -m /opt/rocm_sdk_models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf -c 2048 -ngl 99 --metrics -sm none
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 9.0, VMM: no
build: 3901 (49f4671b) with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
system info: n_threads = 16, n_threads_batch = 16, total_threads = 32

system_info: n_threads = 16 (n_threads_batch = 16) / 32 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 

main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 31
main: loading model
llama_model_loader: loaded meta data with 38 key-value pairs and 339 tensors from /opt/rocm_sdk_models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-7...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-7B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 7
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - kv  34:                      quantize.imatrix.file str              = /models_out/Qwen2.5-7B-Instruct-GGUF/...
llama_model_loader: - kv  35:                   quantize.imatrix.dataset str              = /training_dir/calibration_datav3.txt
llama_model_loader: - kv  36:             quantize.imatrix.entries_count i32              = 196
llama_model_loader: - kv  37:              quantize.imatrix.chunks_count i32              = 128
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q8_0:  198 tensors
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 152064
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 3584
llm_load_print_meta: n_layer          = 28
llm_load_print_meta: n_head           = 28
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 18944
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 7.62 B
llm_load_print_meta: model size       = 7.54 GiB (8.50 BPW) 
llm_load_print_meta: general.name     = Qwen2.5 7B Instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 29/29 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  7165.44 MiB
llm_load_tensors:        CPU buffer size =   552.23 MiB
.......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =   112.00 MiB
llama_new_context_with_model: KV self size  =  112.00 MiB, K (f16):   56.00 MiB, V (f16):   56.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     1.16 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   304.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    11.01 MiB
llama_new_context_with_model: graph nodes  = 986
llama_new_context_with_model: graph splits = 2
llama_init_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
:0:/home/saidp/Downloads/rocm_sdk_builder/src_projects/clr/hipamd/src/hip_global.cpp:114 : 1173855977 us: [pid:16546 tid:0x7e1f3d1ecc40] Cannot find Symbol with name: Cijk_Alik_Bljk_HB_GB_MT32x32x16_SN_APM1_AF0EM2_AF1EM2_AMAS3_ASAE01_ASCE01_ASEM1_BL1_BS1_DTLA0_DTLB0_EPS1_FL0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA2_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT4_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG8_8_1_WGM1
Aborted (core dumped)

Here is the full log with AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 ROCM_SDK_PRINTOUT_DEBUG_MESSAGES=1 enabled.
sdk_error_output.txt

@Said-Akbar
Copy link
Author

@lamikr , I see AMD MI50 (also gfx906) costs around $140 on eBay with shipping. Let me know if you are open to the idea of supporting gfx906. I am willing to ship one of those to you. Or else, if you are in the Bay Area, I can lend you one MI50. This way it will be easier for you to debug and fix issues. Thanks!

@lamikr
Copy link
Owner

lamikr commented Nov 27, 2024

@Said-Akbar Thank you for the suggestion, it would be great if I could loan one of your gfx906 for a while for testing. I live on the bay area but I travel also quite often also to San Francisco if that's easier for you. Are you able to send me a private message to gmail or linked-in?

I just bought one gfx1010 from ebay to better test on rdna1 cards so I would like to hold a while before purchasing the gfx906.

@Said-Akbar
Copy link
Author

sure, let me send you a linkedin message.

@Said-Akbar
Copy link
Author

Said-Akbar commented Nov 29, 2024

Hello @lamikr ,

Quick update. I installed the default rocm 6.2.4 library on my Ubuntu 24.04.

sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm

After that, I copied the 'library' folder from /opt/rocm-6.2.4/lib/rocblas/ to /opt/rocm_sdk_612/lib64/rocblas/ (I backed up the broken 'library' folder in here by renaming it as 'library2'). Amazingly, llama.cpp is working now.

mv /opt/rocm_sdk_612/lib64/rocblas/library /opt/rocm_sdk_612/lib64/rocblas/library2
cp -r /opt/rocm-6.2.4/lib/rocblas/library /opt/rocm_sdk_612/lib64/rocblas/

Here is llama.cpp chat example:

llama-server -m /opt/rocm_sdk_models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf -c 2048 -ngl 99 --metrics -sm none -fa

In browser:

User: what is your name?

Llama: My name is Llama. It's a playful and unique name that suits my friendly and helpful nature! How can I assist you today? 😊

Interestingly, if you do not use Flash attention, the output is gibberish:

llama-server -m /opt/rocm_sdk_models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf -c 2048 -ngl 99 --metrics -sm none

# In browser:

User: what is your name?

Llama: 0一致好评obierno网首页?>> ⓘ ⓘNguồnINCLUDEDTYPES븐-UA?>> arrangnı ⓘ)(_PECT网首页-Origin ⓘUTIL锬网首页 yans ⓘ:both媞独角兽 nâ网首页一致好评 ⓘ ⓘurator帼)(((.getOwnProperty✦一致好评一致好评)((((一致好评 ⓘobierno.Invariant荭Nguồn一致好评 ⓘltra ⓘ.addObject ⓘ一致好评 ⓘ一致好评?>>一致好评уницип?>>瑨?>> ⓘ虓INCLUDED ⓘ蕤查看全文?>>一致好评 ⓘ网首页--------------------------------------------------------------------------

Not sure if this llama.cpp issue or rocm rocblas version mismatch issue. But I think you will be able to debug the issue better now (once I give you MI50) since we clearly know it is an issue with /opt/rocm_sdk_612/lib64/rocblas/library files.

@Said-Akbar
Copy link
Author

Said-Akbar commented Nov 29, 2024

Here are two llama.cpp benchmarks.

without flash attention:

llama-bench -m /opt/rocm_sdk_models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf -ngl 99 -sm none -p 512 -n 128
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 9.0, VMM: no
| model                          |       size |     params | backend    | ngl |    sm |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | ------------: | -------------------: |
| qwen2 ?B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |  none |         pp512 |        291.13 ± 0.05 |
| qwen2 ?B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |  none |         tg128 |         61.21 ± 0.45 |

build: 49f4671b (3901)

with flash attention:

llama-bench -m /opt/rocm_sdk_models/Qwen2.5-7B-Instruct-Q8_0/Qwen2.5-7B-Instruct-Q8_0.gguf -ngl 99 -sm none -p 512 -n 128 -fa 1
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no
  Device 1: AMD Radeon Graphics, compute capability 9.0, VMM: no
| model                          |       size |     params | backend    | ngl |    sm | fa |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | -: | ------------: | -------------------: |
| qwen2 ?B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |  none |  1 |         pp512 |        266.50 ± 0.04 |
| qwen2 ?B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |  none |  1 |         tg128 |         52.19 ± 3.53 |

build: 49f4671b (3901)

So, it is a bit faster without flash attention but it output gibberish. Flash attention slows down text generation a bit but it is readable. Also, splitting the model across two GPUs did not result in any speed improvements.

@lamikr
Copy link
Owner

lamikr commented Dec 2, 2024

Hi Said and thanks for the coffee and MI50 gpu loan!
I just installed your GPU to my computer and lspci can detect it.

03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB] (rev 02)
But amdgpu probe seems to fail and rocminfo can not detect it. I am wondering whether I am missing some kernel boot parameter or firmware file.

[    5.408749] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66A1 0x1002:0x0834 0x02).
[    5.408757] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[    5.408787] amdgpu 0000:03:00.0: probe with driver amdgpu failed with error -12
[    5.408823] ------------[ cut here ]------------
[    5.408825] WARNING: CPU: 2 PID: 462 at arch/x86/mm/ioremap.c:461 iounmap+0x33/0x100
[    5.408833] Modules linked in: amdgpu(+) video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper
[    5.408854] CPU: 2 UID: 0 PID: 462 Comm: (udev-worker) Not tainted 6.12.1+ #12
[    5.408858] Hardware name: Gigabyte Technology Co., Ltd. B450 I AORUS PRO WIFI/B450 I AORUS PRO WIFI-CF, BIOS F5 01/25/2019
[    5.408861] RIP: 0010:iounmap+0x33/0x100
[    5.408866] Code: 48 8b 05 00 11 bc 01 55 53 48 39 c7 72 1c 48 89 fb eb 20 cc cc cc 48 ba 00 00 00 00 00 00 32 00 48 8d 44 10 ff 48 39 c3 72 1d <0f> 0b 5b 5d e9 54 6d 0c 01 48 ba 00 00 00 00 00 20 00 00 48 8d 44
[    5.408869] RSP: 0018:ffffb4cc4139f7e0 EFLAGS: 00010207
[    5.408873] RAX: ffffb4cc40000000 RBX: ffff9e94d1a00000 RCX: 0000000000000000
[    5.408876] RDX: 0000000000033d80 RSI: ffffb4cc4139f804 RDI: 0000000000000000
[    5.408878] RBP: ffff9e94d1a00010 R08: ffbdcfcfcfafafbe R09: ffffb4cc4139f748
[    5.408881] R10: 0000000000000008 R11: fefefefefefefeff R12: ffffb4cc4139f888
[    5.408883] R13: 00000000ffffffff R14: ffff9e94c1fdb36c R15: ffff9e94c35a6800
[    5.408886] FS:  00007ff85295f880(0000) GS:ffff9e9bc0f00000(0000) knlGS:0000000000000000
[    5.408889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.408891] CR2: 0000000024dc1c60 CR3: 000000010f7d6000 CR4: 00000000003506f0
[    5.408894] Call Trace:
[    5.408897]  <TASK>
[    5.408900]  ? __warn+0x89/0x130
[    5.408905]  ? iounmap+0x33/0x100
[    5.408910]  ? report_bug+0x164/0x190
[    5.408917]  ? handle_bug+0x58/0x90
[    5.408921]  ? exc_invalid_op+0x17/0x70
[    5.408925]  ? asm_exc_invalid_op+0x1a/0x20
[    5.408933]  ? iounmap+0x33/0x100
[    5.408938]  amdgpu_device_fini_sw+0x3fa/0x540 [amdgpu]
[    5.409488]  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
[    5.409876]  drm_dev_put.part.0+0x3c/0x60
[    5.409883]  release_nodes+0x40/0xb0
[    5.409888]  devres_release_all+0x8c/0xc0
[    5.409893]  device_unbind_cleanup+0xe/0x70
[    5.409897]  really_probe+0x1a0/0x380
[    5.409902]  ? __pfx___driver_attach+0x10/0x10
[    5.409904]  __driver_probe_device+0x78/0x150
[    5.409908]  driver_probe_device+0x1f/0x90
[    5.409912]  __driver_attach+0xd2/0x1c0
[    5.409915]  bus_for_each_dev+0x88/0xd0
[    5.409919]  bus_add_driver+0x142/0x270
[    5.409924]  driver_register+0x59/0x100
[    5.409927]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[    5.410258]  do_one_initcall+0x5b/0x320
[    5.410265]  do_init_module+0x90/0x270
[    5.410270]  init_module_from_file+0x86/0xc0
[    5.410277]  idempotent_init_module+0x11d/0x310
[    5.410283]  __x64_sys_finit_module+0x5e/0xb0
[    5.410288]  do_syscall_64+0x82/0x160
[    5.410301]  ? srso_return_thunk+0x5/0x5f
[    5.410304]  ? vm_mmap_pgoff+0x131/0x1c0
[    5.410309]  ? srso_return_thunk+0x5/0x5f
[    5.410313]  ? srso_return_thunk+0x5/0x5f
[    5.410315]  ? ksys_mmap_pgoff+0x156/0x220
[    5.410319]  ? srso_return_thunk+0x5/0x5f
[    5.410321]  ? syscall_exit_to_user_mode+0x10/0x210
[    5.410325]  ? srso_return_thunk+0x5/0x5f
[    5.410327]  ? do_syscall_64+0x8e/0x160
[    5.410331]  ? srso_return_thunk+0x5/0x5f
[    5.410333]  ? syscall_exit_to_user_mode+0x10/0x210
[    5.410337]  ? srso_return_thunk+0x5/0x5f
[    5.410339]  ? do_syscall_64+0x8e/0x160
[    5.410342]  ? srso_return_thunk+0x5/0x5f
[    5.410344]  ? do_syscall_64+0x8e/0x160
[    5.410349]  ? srso_return_thunk+0x5/0x5f
[    5.410351]  ? syscall_exit_to_user_mode+0x10/0x210
[    5.410355]  ? srso_return_thunk+0x5/0x5f
[    5.410357]  ? do_syscall_64+0x8e/0x160
[    5.410360]  ? srso_return_thunk+0x5/0x5f
[    5.410362]  ? do_syscall_64+0x8e/0x160
[    5.410365]  ? exc_page_fault+0x76/0x190
[    5.410369]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    5.410374] RIP: 0033:0x7ff8533c4789
[    5.410377] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 57 b6 0c 00 f7 d8 64 89 01 48
[    5.410379] RSP: 002b:00007ffe22d7aba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    5.410383] RAX: ffffffffffffffda RBX: 0000000024dc9000 RCX: 00007ff8533c4789
[    5.410385] RDX: 0000000000000000 RSI: 00007ff853505aad RDI: 0000000000000015
[    5.410387] RBP: 00007ff853505aad R08: 0000000000000000 R09: 0000000000000000
[    5.410389] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000020000
[    5.410391] R13: 0000000000000000 R14: 0000000024da9e50 R15: 0000000000000000
[    5.410397]  </TASK>
[    5.410399] ---[ end trace 0000000000000000 ]---

@cb88
Copy link

cb88 commented Dec 3, 2024

I have two of them going in a build this week so maybe I can help out too! I have a bit faster build box also as its EPYC 7352 24 core. I'd be happy to let either of you have access as well for testing once its setup.

You need to enable above 4G decoding for these cards if I remember correctly. Maybe that is causing a crash. also make sure you enable Resizable BAR and or Smart Access Memory in your BIOS. May also need SR-IOV enabled.

Also the display output port does nothing with the Instinct VBIOS this is apparently so the framebuffer doesn't waste some of the vram. There is a V420 VBIOS that may or may not work on these cards I have not seen any confirmation one way or the other online about that though. V420 was never officially released. Not 100% sure the display output doesn't work on Linux there are conflicting posts about this.

Anyway just some guesses on my part since you guys are already a bit ahead of me in setting up your hardware.

For reference your kernel output should look more like this. The other failures for them are VM pass through related and not relevant for us.
https://www.reddit.com/r/Proxmox/comments/1g4d5mf/amd_gpu_passthrough_issues_with_amd_mi60/

I also have 2 MI25 and a Vega FE I can start testing with at some point.

@lamikr
Copy link
Owner

lamikr commented Dec 3, 2024

@cb88 Thanks for the info, nice to get more people working with this card. So far I have found suprisingly little documentation about these MI50 cards.

So it seems I need to hook the display to via iGPU to checkout the bios settings. (ordered just before reading the mini-displayport to hdmi cable...) So far I have only tested this over ssh.

here is the probe error for this gpu I am seeing decoded:

[ 5.418447] ? __warn (kernel/panic.c:748)
[ 5.418451] ? iounmap (arch/x86/mm/ioremap.c:461)
[ 5.418455] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 5.418461] ? handle_bug (arch/x86/kernel/traps.c:260)
[ 5.418465] ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
[ 5.418468] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 5.418475] ? iounmap (arch/x86/mm/ioremap.c:461)
[ 5.418480] amdgpu_device_fini_sw (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4634) amdgpu
[ 5.418904] amdgpu_driver_release_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:1462) amdgpu
[ 5.419279] drm_dev_put.part.0 (drivers/gpu/drm/drm_drv.c:785 include/linux/kref.h:65 drivers/gpu/drm/drm_drv.c:819)
[ 5.419286] release_nodes (drivers/base/devres.c:506)
[ 5.419291] devres_release_all (drivers/base/devres.c:537)
[ 5.419295] device_unbind_cleanup (drivers/base/dd.c:551)
[ 5.419300] really_probe (drivers/base/dd.c:727)
[ 5.419304] ? __pfx___driver_attach (drivers/base/dd.c:1157)
[ 5.419307] __driver_probe_device (drivers/base/dd.c:800)
[ 5.419311] driver_probe_device (drivers/base/dd.c:830)
[ 5.419314] __driver_attach (drivers/base/dd.c:1217)
[ 5.419318] bus_for_each_dev (drivers/base/bus.c:370)
[ 5.419322] bus_add_driver (drivers/base/bus.c:675)
[ 5.419327] driver_register (drivers/base/driver.c:246)
[ 5.419330] ? __pfx_amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2574) amdgpu
[ 5.419656] do_one_initcall (init/main.c:1269)
[ 5.419663] do_init_module (kernel/module/main.c:2543)
[ 5.419668] init_module_from_file (kernel/module/main.c:3198)
[ 5.419675] idempotent_init_module (include/linux/spinlock.h:351 kernel/module/main.c:3139 kernel/module/main.c:3211)
[ 5.419681] __x64_sys_finit_module (include/linux/file.h:68 kernel/module/main.c:3238 kernel/module/main.c:3220 kernel/module/main.c:3220)

@Said-Akbar
Copy link
Author

Hello @lamikr ,

I have an Nvidia card for video output since these AMD cards do not have working video outputs. I used this command to install drivers for MI50/60 on my Ubuntu 24.04 which worked fine for me:

sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms rocm

@lamikr
Copy link
Owner

lamikr commented Dec 4, 2024

It took a day to struggle with bios update but I got the MI50 now showing up on rocminfo! Could have taken long time to figure this out without your suggestions.

I but my steps/strugless here just to reference in case it's useful for somebody other.
In my test-pc I have old gigabyte b450 mb with ancient firmware without any of those mentioned bios settings. (above 4G encoding, re-bar, smart access memory). I anyway read from internet that it should be possibe to get that also working by bios updates. In my mb case, I needed to update it in 2 steps as I had so ancient bios from 2019 in that pc.

After the first bios update the boot went to newer-ending reboot cycle without showing anything on display. I suspect it failed to find proper settings for my ddr4. Anyway I was able to solve that by taking everything away, resetting cmos settings and then putting first only one ddr cam on place.

Second bios update to latest version went then smoothly except that system refused to boot from my nvme card. So needed to re-install grub again. And then when I added the MI50 back, display went again black... Fixed that by removing MI50 again and then forcing in bios to use the display from my amd's iGPU.

In my case I only found from bios the option to "Enable above 4G decoding". To my understanding that also enables in my gigabyte bios at least the smart access memory option. In addition of "enable above 4G encoding", I only enabled the "amd hw virtualization support" and tuned the memory settings little bit from defaults.

Now the system uses the iGPU for display and MI50 driver probe also works ok and I can see both cards now on rocminfo.

Here is relevant part of dmesg

'
[ 2.445160] dracut: Mageia-9
[ 2.626585] ACPI: video: Video Device [VGA] (multi-head: yes rom: no post: no)
[ 2.626939] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:01/LNXVIDEO:00/input/input13
[ 2.627087] ACPI: video: Video Device [VGA1] (multi-head: yes rom: no post: no)
[ 2.627170] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:13/LNXVIDEO:01/input/input14
[ 5.657740] [drm] amdgpu kernel modesetting enabled.
[ 5.657761] amdgpu: vga_switcheroo: detected switching method _SB_.PCI0.GP17.VGA_.ATPX handle
[ 5.657848] amdgpu: ATPX version 1, functions 0x00000000
[ 5.666334] amdgpu: Virtual CRAT table created for CPU
[ 5.666355] amdgpu: Topology: Add CPU node
[ 5.666479] amdgpu 0000:03:00.0: enabling device (0000 -> 0002)
[ 5.666521] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66A1 0x1002:0x0834 0x02).
[ 5.666532] [drm] register mmio base: 0xFCD00000
[ 5.666534] [drm] register mmio size: 524288
[ 5.666653] [drm] MCBP is disabled
[ 5.666656] [drm] add ip block number 0 <soc15_common>
[ 5.666658] [drm] add ip block number 1 <gmc_v9_0>
[ 5.666661] [drm] add ip block number 2 <vega20_ih>
[ 5.666662] [drm] add ip block number 3
[ 5.666664] [drm] add ip block number 4
[ 5.666666] [drm] add ip block number 5
[ 5.666668] [drm] add ip block number 6 <gfx_v9_0>
[ 5.666670] [drm] add ip block number 7 <sdma_v4_0>
[ 5.666672] [drm] add ip block number 8 <uvd_v7_0>
[ 5.666673] [drm] add ip block number 9 <vce_v4_0>
[ 5.666690] amdgpu 0000:03:00.0: amdgpu: ACPI VFCT table present but broken (too short #2),skipping
[ 5.668116] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from platform
[ 5.668120] amdgpu: ATOM BIOS: 113-D1631400-X11
[ 5.673006] [drm] UVD(0) is enabled in VM mode
[ 5.673009] [drm] UVD(1) is enabled in VM mode
[ 5.673011] [drm] UVD(0) ENC is enabled in VM mode
[ 5.673012] [drm] UVD(1) ENC is enabled in VM mode
[ 5.673014] [drm] VCE enabled in VM mode
[ 5.673039] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 5.673054] [drm] GPU posting now...
[ 5.673722] amdgpu 0000:03:00.0: amdgpu: MEM ECC is active.
[ 5.673724] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is active.
[ 5.673739] amdgpu 0000:03:00.0: amdgpu: RAS INFO: ras initialized successfully, hardware ability[67f7f] ras_mask[67f7f]
[ 5.673752] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 5.673767] amdgpu 0000:03:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 5.673771] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 5.673781] [drm] Detected VRAM RAM=16368M, BAR=16384M
[ 5.673783] [drm] RAM width 4096bits HBM
[ 5.673952] [drm] amdgpu: 16368M of VRAM memory ready
[ 5.673955] [drm] amdgpu: 14997M of GTT memory ready.
[ 5.673983] [drm] GART: num cpu pages 131072, num gpu pages 131072
[ 5.674198] [drm] PCIE GART of 512M enabled.
[ 5.674200] [drm] PTB located at 0x00000083FEF00000
[ 5.674862] amdgpu: hwmgr_sw_init smu backed is vega20_smu
[ 5.679173] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
[ 5.679218] [drm] PSP loading UVD firmware
[ 5.681731] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
[ 5.681790] [drm] PSP loading VCE firmware
[ 5.833428] amdgpu 0000:03:00.0: amdgpu: reserve 0x400000 from 0x83fe000000 for PSP TMR
[ 5.917214] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 5.921651] [drm] Display Core v3.2.301 initialized on DCE 12.1
[ 5.924625] [drm] kiq ring mec 2 pipe 1 q 0
[ 5.967401] [drm] UVD and UVD ENC initialized successfully.
[ 6.168117] [drm] VCE initialized successfully.
[ 6.260871] amdgpu: HMM registered 16368MB device memory
[ 6.262919] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 6.262935] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[ 6.263086] amdgpu: Virtual CRAT table created for GPU
[ 6.263344] amdgpu: Topology: Add dGPU node [0x66a1:0x1002]
[ 6.263347] kfd kfd: amdgpu: added device 1002:66a1
[ 6.276989] amdgpu 0000:03:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 16, active_cu_number 60
[ 6.276995] amdgpu 0000:03:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[ 6.276998] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 6.277000] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 6.277003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 6.277005] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 6.277007] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 6.277010] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 6.277012] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 6.277014] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 6.277016] amdgpu 0000:03:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[ 6.277019] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
[ 6.277021] amdgpu 0000:03:00.0: amdgpu: ring page0 uses VM inv eng 1 on hub 8
[ 6.277023] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 4 on hub 8
[ 6.277025] amdgpu 0000:03:00.0: amdgpu: ring page1 uses VM inv eng 5 on hub 8
[ 6.277027] amdgpu 0000:03:00.0: amdgpu: ring uvd_0 uses VM inv eng 6 on hub 8
[ 6.277029] amdgpu 0000:03:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 7 on hub 8
[ 6.277032] amdgpu 0000:03:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 8 on hub 8
[ 6.277034] amdgpu 0000:03:00.0: amdgpu: ring uvd_1 uses VM inv eng 9 on hub 8
[ 6.277036] amdgpu 0000:03:00.0: amdgpu: ring uvd_enc_1.0 uses VM inv eng 10 on hub 8
[ 6.277038] amdgpu 0000:03:00.0: amdgpu: ring uvd_enc_1.1 uses VM inv eng 11 on hub 8
[ 6.277041] amdgpu 0000:03:00.0: amdgpu: ring vce0 uses VM inv eng 12 on hub 8
[ 6.277043] amdgpu 0000:03:00.0: amdgpu: ring vce1 uses VM inv eng 13 on hub 8
[ 6.277045] amdgpu 0000:03:00.0: amdgpu: ring vce2 uses VM inv eng 14 on hub 8
[ 6.286871] amdgpu: Detected AMDGPU DF Counters. # of Counters = 8.
[ 6.286902] amdgpu: Detected AMDGPU 2 Perf Events.
[ 6.287148] amdgpu 0000:03:00.0: amdgpu: Runtime PM not available
[ 6.287446] [drm] Initialized amdgpu 3.59.0 for 0000:03:00.0 on minor 1
[ 6.290575] amdgpu 0000:0d:00.0: enabling device (0006 -> 0007)
[ 6.290656] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1458:0xD000 0xC6).
[ 6.290678] [drm] register mmio base: 0xFC900000
[ 6.290680] [drm] register mmio size: 524288
[ 6.290762] [drm] MCBP is disabled
[ 6.290768] [drm] add ip block number 0 <soc15_common>
[ 6.290772] [drm] add ip block number 1 <gmc_v9_0>
[ 6.290775] [drm] add ip block number 2 <vega10_ih>
[ 6.290778] [drm] add ip block number 3
[ 6.290781] [drm] add ip block number 4
[ 6.290784] [drm] add ip block number 5
[ 6.290787] [drm] add ip block number 6 <gfx_v9_0>
[ 6.290789] [drm] add ip block number 7 <sdma_v4_0>
[ 6.290792] [drm] add ip block number 8 <vcn_v1_0>
[ 6.290891] amdgpu 0000:0d:00.0: amdgpu: Fetched VBIOS from VFCT
[ 6.290897] amdgpu: ATOM BIOS: 113-RAVEN-117
[ 6.317263] amdgpu 0000:0d:00.0: vgaarb: deactivate vga console
[ 6.317270] amdgpu 0000:0d:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[ 6.317328] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 6.317341] amdgpu 0000:0d:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[ 6.317345] amdgpu 0000:0d:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[ 6.317356] [drm] Detected VRAM RAM=2048M, BAR=2048M
[ 6.317359] [drm] RAM width 128bits DDR4
[ 6.317572] [drm] amdgpu: 2048M of VRAM memory ready
[ 6.317576] [drm] amdgpu: 14997M of GTT memory ready.
[ 6.317607] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 6.318012] [drm] PCIE GART of 1024M enabled.
[ 6.318014] [drm] PTB located at 0x000000F400A00000
[ 6.318688] amdgpu: hwmgr_sw_init smu backed is smu10_smu
[ 6.319547] [drm] Found VCN firmware Version ENC: 1.15 DEC: 3 VEP: 0 Revision: 0
[ 6.340548] amdgpu 0000:0d:00.0: amdgpu: reserve 0x400000 from 0xf47fc00000 for PSP TMR
[ 6.406081] amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 6.412558] amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 6.412561] amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 6.413975] [drm] DM_PPLIB: values for F clock
[ 6.413980] [drm] DM_PPLIB: 400000 in kHz, 3649 in mV
[ 6.413982] [drm] DM_PPLIB: 933000 in kHz, 4074 in mV
[ 6.413985] [drm] DM_PPLIB: 1067000 in kHz, 4250 in mV
[ 6.413988] [drm] DM_PPLIB: values for DCF clock
[ 6.413990] [drm] DM_PPLIB: 300000 in kHz, 3649 in mV
[ 6.413992] [drm] DM_PPLIB: 600000 in kHz, 4074 in mV
[ 6.413994] [drm] DM_PPLIB: 626000 in kHz, 4250 in mV
[ 6.413996] [drm] DM_PPLIB: 654000 in kHz, 4399 in mV
[ 6.414321] [drm] Display Core v3.2.301 initialized on DCN 1.0
[ 6.472988] [drm] kiq ring mec 2 pipe 1 q 0
[ 6.488406] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 6.488428] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[ 6.488611] amdgpu: Virtual CRAT table created for GPU
[ 6.489099] amdgpu: Topology: Add dGPU node [0x15dd:0x1002]
[ 6.489102] kfd kfd: amdgpu: added device 1002:15dd
[ 6.489124] amdgpu 0000:0d:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 11, active_cu_number 11
[ 6.489130] amdgpu 0000:0d:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[ 6.489134] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 6.489137] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 6.489140] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 6.489142] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 6.489145] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 6.489148] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 6.489151] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 6.489154] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 6.489157] amdgpu 0000:0d:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[ 6.489159] amdgpu 0000:0d:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
[ 6.489162] amdgpu 0000:0d:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 8
[ 6.489165] amdgpu 0000:0d:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 8
[ 6.489168] amdgpu 0000:0d:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 8
[ 6.489170] amdgpu 0000:0d:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 8
[ 6.493727] amdgpu: pp_dpm_get_sclk_od was not implemented.
[ 6.493730] amdgpu: pp_dpm_get_mclk_od was not implemented.
[ 6.493883] amdgpu 0000:0d:00.0: amdgpu: Runtime PM not available
[ 6.494543] [drm] Initialized amdgpu 3.59.0 for 0000:0d:00.0 on minor 2
[ 6.500136] fbcon: amdgpudrmfb (fb0) is primary device
[ 6.500141] fbcon: Deferring console take-over
[ 6.500146] amdgpu 0000:0d:00.0: [drm] fb0: amdgpudrmfb frame buffer device
`
And rocminfo

`


Agent 2


Name: gfx906
Uuid: GPU-022849817348c2f7
Marketing Name: AMD Instinct MI60 / MI50
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26273(0x66a1)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1725
BDFID: 768
Internal Node ID: 1
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 472
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
isa_name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
`

@Said-Akbar
Copy link
Author

Great to hear that you made it work with your PC! Now you should be able to debug ROCM SDK related issues.

@Said-Akbar
Copy link
Author

I hope you were able to install the air blower and control its fan speed as well. When MI50 overheats, the card will throttle its performance.

@Said-Akbar
Copy link
Author

Hello @lamikr ,

How is your experiment going with llama.cpp and MI50?

Can you please share commands to fix vllm installation?

Thanks!

@commandline-be
Copy link

I can also restart any test on the RVII MI25

@lamikr
Copy link
Owner

lamikr commented Dec 14, 2024

I have tried to debug the rocblas problem now for couple of days but have not yet been able to find out what is causing the problem. I have added the debug and .so files from rocblas are loaded but I do not yet understand why it will then starts complaining about missing symbol. I have tested by building rocBLAS and tensile for gfx906, gfx906:xnack- and gfx906:xnack+ and always same problem. I have also tested to take all my rocblas and Tensile patches away that are needed for some other gpus's and also by building the rocblas and tensile versions with rocm-6.2.4 tags.

I hope to find solution in this weekend...

@cb88
Copy link

cb88 commented Dec 14, 2024

Is it the same problem they are talking about here? ROCm/rocBLAS#1455

@lamikr lamikr closed this as completed in 84faa05 Dec 17, 2024
@Said-Akbar
Copy link
Author

Hello @lamikr ,

I see you closed the bug. What was the issue here?
Can you please see if you can run vllm with this fix on MI50?
Thanks!

@lamikr
Copy link
Owner

lamikr commented Dec 18, 2024

Hi @Said-Akbar It seems that this closed automatically once I pushed one fix in. I will re-open as there are some more work to do. The original problem with missing symbol is now fixed and you should be able to get it by running commands:

./babs.sh -up master
./babs.sh -ca
./babs.sh -b
./babs.sh -b binfo/extra/ai_tools.blist

(babs.sh -ca may be needed as I detected one bug in "-up" command that I only fixed now to latest command version on today)

llama cpp is now working at least for me with the MI50.

./run_llama_benchmark.sh 

llama-cli -ngl 0 -m /opt/rocm_sdk_models/microsoft/Phi-3-mini-4k-instruct-q4.gguf -n 10 -f <(printf 'banana %0.s' {1..50})

llama_perf_sampler_print:    sampling time =       0.41 ms /   112 runs   (    0.00 ms per token, 272506.08 tokens per second)
llama_perf_context_print:        load time =    2974.79 ms
llama_perf_context_print: prompt eval time =    1759.07 ms /   102 tokens (   17.25 ms per token,    57.99 tokens per second)
llama_perf_context_print:        eval time =     847.57 ms /     9 runs   (   94.17 ms per token,    10.62 tokens per second)
llama_perf_context_print:       total time =    2608.11 ms /   111 tokens

@lamikr
Copy link
Owner

lamikr commented Dec 18, 2024

Benchmark has still some problems with the flash attention. SDPBackend.MATH will work but flash-attention gives ridiculous small time, indicating some type of error I have not yet been able to solve.

cd benchmarks
./run_and_save_benchmarks.sh 
Timestamp for benchmark results: 20241217_163032
Saving to file: 20241217_163032_cpu_vs_gpu_simple.txt
Benchmarking CPU and GPUs
Pytorch version: 2.4.1
ROCM HIP version: 6.1.40093-78d901dd8
       Device:  AMD Ryzen 5 2400G with Radeon Vega Graphics
    'CPU time: 28.731 sec
       Device: AMD Instinct MI60 / MI50
    'GPU time: 0.648 sec
       Device: AMD Radeon Vega 11 Graphics
    'GPU time: 0.398 sec
Benchmark ready

Saving to file: 20241217_163032_pytorch_dot_products.txt
Pytorch version: 2.4.1
dot product calculation test
tensor([[[-0.4279,  0.8421, -0.0186, -1.0688,  0.3439,  0.5811,  0.6432,
          -0.0836],
         [-0.1744,  0.2640,  0.1122, -1.0217,  0.8057,  0.6140,  1.2960,
          -0.2334],
         [-0.6755,  1.1481, -0.0395, -1.0691,  0.2544,  0.5553,  0.6067,
          -0.0797]],

        [[ 1.7157,  0.4617, -0.3163,  1.5292,  1.3783,  1.3435, -0.6919,
          -0.0020],
         [-0.0854,  0.2474, -1.4838,  1.0396, -0.2207,  0.9127, -0.0911,
           0.2069],
         [ 1.5058,  0.4527, -0.4823,  1.4764,  1.2057,  1.3113, -0.6149,
           0.0199]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
Device: AMD Instinct MI60 / MI50 / cuda:0
    Default benchmark:
        23.105 microseconds, 2.310515310091432e-05 sec
    SDPBackend.MATH benchmark:
        3332.839 microseconds, 0.003332838990027085 sec
    SDPBackend.FLASH_ATTENTION benchmark:
        22.992 microseconds, 2.299211609933991e-05 sec
    SDPBackend.EFFICIENT_ATTENTION benchmark:
        18.513 microseconds, 1.8513053949573077e-05 sec
Device: AMD Radeon Vega 11 Graphics / cuda:1
    Default benchmark:
Traceback (most recent call last):
  File "/opt/rocm_sdk_612_gfx906_xnack/benchmarks/../docs/examples/pytorch/flash_attention/flash_attention_dot_product_benchmark.py", line 108, in <module>
    result_arr[ii][0]=benchmark_torch_function_in_microseconds(F.scaled_dot_product_attention, query, key, value)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612_gfx906_xnack/benchmarks/../docs/examples/pytorch/flash_attention/flash_attention_dot_product_benchmark.py", line 87, in benchmark_torch_function_in_microseconds
    return t0.blocked_autorange().mean * 1e6
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612_gfx906_xnack/lib/python3.11/site-packages/torch/utils/benchmark/utils/timer.py", line 372, in blocked_autorange
    number = self._estimate_block_size(min_run_time)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612_gfx906_xnack/lib/python3.11/site-packages/torch/utils/benchmark/utils/timer.py", line 319, in _estimate_block_size
    time_taken = self._timeit(number)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612_gfx906_xnack/lib/python3.11/site-packages/torch/utils/benchmark/utils/timer.py", line 264, in _timeit
    return max(self._timer.timeit(number), 1e-9)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612_gfx906_xnack/lib/python3.11/timeit.py", line 180, in timeit
    timing = self.inner(it, self.timer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<timeit-src>", line 6, in inner
RuntimeError: FlashAttention only supports MI200/MI300X GPUs (gfx90a:sramecc+:xnack- or gfx942:sramecc+:xnack-)

@lamikr
Copy link
Owner

lamikr commented Dec 18, 2024

Time to have some ranting about this idea of optionally adding these xnack and sramecc parameters to be part of the product name...

It just seems that the this combination of "xnack-", "xnack+", "sramecc-:xnack-" and other possible combinations is very easily to get south in the code or as a build parameters. In code-level most of the gpus identifies them self just with simple name like "gfx1010, gfx1100, etc..." and then there are these gcn devices which choose to behave differently.

That would be somehow manageable if these features would be specified only as a build time parameter in a way that all projects will then handle them properly. In reality unfortunately there is now consistency. Some projects can be build either with pure name or with name which have these features as an extra parameter.
Some other accept them in theory but in reality fails due to bugs. And some projects will not allow them and have desired instead to use product names to avoid the mess of handling all of these different combinations of the build target name.

And then there are projects like MIOpen which add these parameters silently on runtime (if name=gfx906, then name=gfx906:xnack-") And while internally there seems to be also the sramecc part of the product name in style "gfx906:sramecc+:xnack-", this sramecc for incosistency seems not to be allowed as a build parameter to be part of the name.

I think the idea of embedding the sramecc/xnack parameters to be part of the name may have sounded to be a clever idea long time ago but seems to cause confusion and bugs in reality.

This just makes it very hard to know whether some bug is just caused because some code may fail to compare gpu-names which have or not have those extra parameterss included in the gpu name in a exactly same way.

@cb88
Copy link

cb88 commented Dec 18, 2024

Why not build with xnack+ it is supported on gfx906 and can be a significant perf improvement...for things that do need it. xnack- is diabling xnack support, which is probably undesirable, if you build it with xnack+ everything should work if it needs it or not. https://niconiconi.neocities.org/tech-notes/xnack-on-amd-gpus/

See here for the possible definitions... its sramecc+/- and xnack+/- for gfx906... there is no sram+ option.
https://github.com/ROCm/ROCR-Runtime/blob/e0fadddb3175cb95ce9e9af2ebd2a205045e854e/src/core/runtime/isa.cpp#L258

SRAMECC is probably only desirable if you are running long running scientific compute tasks... probably irrelevant for AI stuff. But if it gets in the way to disable it, it could be enabled it will just cause some memory overhead I think.

@lamikr lamikr reopened this Dec 18, 2024
@lamikr
Copy link
Owner

lamikr commented Dec 18, 2024

Wondering why the rocBLAS binaries by default then does not enable xnack. In ROCBlas CMakeLists.txt they have "gfx906:xnack-"

If I build just as a "gfx906' as a target and and add one of my trace patches to clr, I see in output:
unique isa_names.device_name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-

cd src_projects/clr
git am ../../patches/rocm-6.1.2/debug/clr/0001-clr-debug.patch
cd ../..
./babs.sh -b binfo/core/009_04_hipcc_clr.binfo
cd benchmarks
./run_and_save_benchmarks.sh

@cb88
Copy link

cb88 commented Dec 18, 2024

ROCm/ROCm#2358

@Said-Akbar
Copy link
Author

Thanks for the fix!

./run_llama_benchmark.sh 

llama-cli -ngl 0 -m /opt/rocm_sdk_models/microsoft/Phi-3-mini-4k-instruct-q4.gguf -n 10 -f <(printf 'banana %0.s' {1..50})

llama_perf_sampler_print:    sampling time =       0.41 ms /   112 runs   (    0.00 ms per token, 272506.08 tokens per second)
llama_perf_context_print:        load time =    2974.79 ms
llama_perf_context_print: prompt eval time =    1759.07 ms /   102 tokens (   17.25 ms per token,    57.99 tokens per second)
llama_perf_context_print:        eval time =     847.57 ms /     9 runs   (   94.17 ms per token,    10.62 tokens per second)
llama_perf_context_print:       total time =    2608.11 ms /   111 tokens

Interesting. Phi-3-mini is a 3.8 billion parameter model. You should see at least ~60 tokens per second generation for MI50. I see you are using -ngl 0 which might be the reason for this ~10 tps slow response. Can you please use -ngl 99 and see if it loads into the GPU and generates text faster?

Thanks!

@Said-Akbar
Copy link
Author

also, as @cb88 mentioned xnack should increase the GPU's performance. But not sure how you can compile the repo with xnack enabled.
I was able to enable xnack in Ubuntu as follows:

sudo nano /etc/default/grub
# edit this line by adding amdgpu arguments.
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdgpu.noretry=0 amdgpu.xnack=1"
sudo update-grub
export HSA_XNACK=1

But this does not impact my GPU's inference speed since the code compiled for ROCm did not have xnack enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants