issue with RTX A6000 execution #55

keithjohngates · 2025-02-28T04:59:37Z

🐛 Describe the bug

I am using:

nvidia RTX A6000 48GB

Followed the instructions carefully, all seemed to install and be fine.

CUDA_DEVICE_ORDER=PCI_BUS_ID python -m olmocr.pipeline ./localworkspace --pdfs /media/pop/samsung256/x64_gsqld_report_files/e408bd57-03eb-4d08-b92c-ab7bf632cfca/cr_100468_7.pdf

Any ideas???

something to do with sglang ?? not installing - although it seems to be in the conda env, but not running properly ??

Gives the following error:

(olmocr) pop@pop-os:~/Documents/olmocr$ CUDA_DEVICE_ORDER=PCI_BUS_ID python -m olmocr.pipeline ./localworkspace --pdfs /media/pop/samsung256/x64_gsqld_report_files/e408bd57-03eb-4d08-b92c-ab7bf632cfca/cr_100468_7.pdf
INFO:olmocr.check:pdftoppm is installed and working.
2025-02-28 13:00:42,979 - main - INFO - Got --pdfs argument, going to add to the work queue
2025-02-28 13:00:42,979 - main - INFO - Loading file at /media/pop/samsung256/x64_gsqld_report_files/e408bd57-03eb-4d08-b92c-ab7bf632cfca/cr_100468_7.pdf as PDF document
2025-02-28 13:00:42,979 - main - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|███████████████| 1/1 [00:00<00:00, 178.34it/s]
2025-02-28 13:00:42,985 - main - INFO - Calculated items_per_group: 33 based on average pages per PDF: 15.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-02-28 13:00:43,106 - main - INFO - Starting pipeline with PID 66979
INFO:olmocr.work_queue:Initialized local queue with 1 work items
2025-02-28 13:00:43,168 - main - WARNING - Attempt 1: All connection attempts failed
2025-02-28 13:00:44,193 - main - WARNING - Attempt 2: All connection attempts failed
2025-02-28 13:00:45,228 - main - WARNING - Attempt 3: All connection attempts failed
2025-02-28 13:00:46,273 - main - WARNING - Attempt 4: All connection attempts failed
2025-02-28 13:00:47,299 - main - WARNING - Attempt 5: All connection attempts failed
2025-02-28 13:00:48,346 - main - WARNING - Attempt 6: All connection attempts failed
2025-02-28 13:00:48,469 - main - INFO - [2025-02-28 13:00:48] server_args=ServerArgs(model_path='allenai/olmOCR-7B-0225-preview', tokenizer_path='allenai/olmOCR-7B-0225-preview', tokenizer_mode='auto', load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='allenai/olmOCR-7B-0225-preview', chat_template='qwen2-vl', is_embedding=False, revision=None, skip_tokenizer_init=False, host='127.0.0.1', port=30024, mem_fraction_static=0.8, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=1, stream_interval=1, stream_output=False, random_seed=136363370, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http='warning', log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None)
2025-02-28 13:00:49,387 - main - WARNING - Attempt 7: All connection attempts failed
2025-02-28 13:00:50,094 - main - INFO - Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
2025-02-28 13:00:50,544 - main - WARNING - Attempt 8: All connection attempts failed
2025-02-28 13:00:51,567 - main - WARNING - Attempt 9: All connection attempts failed
2025-02-28 13:00:51,900 - main - INFO - [2025-02-28 13:00:51] Use chat template for the OpenAI-compatible API server: qwen2-vl
2025-02-28 13:00:52,614 - main - WARNING - Attempt 10: All connection attempts failed
2025-02-28 13:00:53,637 - main - WARNING - Attempt 11: All connection attempts failed
2025-02-28 13:00:54,660 - main - WARNING - Attempt 12: All connection attempts failed
2025-02-28 13:00:55,683 - main - WARNING - Attempt 13: All connection attempts failed

Versions

hope this is something obvious !!!

The text was updated successfully, but these errors were encountered:

erichan1986 · 2025-02-28T14:12:38Z

same issues

2025-02-28 14:11:17,944 - main - WARNING - Attempt 66: All connection attempts failed
2025-02-28 14:11:18,978 - main - WARNING - Attempt 67: All connection attempts failed
2025-02-28 14:11:20,038 - main - WARNING - Attempt 68: All connection attempts failed
2025-02-28 14:11:21,094 - main - WARNING - Attempt 69: All connection attempts failed
2025-02-28 14:11:22,155 - main - WARNING - Attempt 70: All connection attempts failed
2025-02-28 14:11:23,190 - main - WARNING - Attempt 71: All connection attempts failed
2025-02-28 14:11:24,224 - main - WARNING - Attempt 72: All connection attempts failed
2025-02-28 14:11:25,259 - main - WARNING - Attempt 73: All connection attempts failed
2025-02-28 14:11:25,980 - main - INFO - Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
2025-02-28 14:11:26,293 - main - WARNING - Attempt 74: All connection attempts failed
2025-02-28 14:11:27,327 - main - WARNING - Attempt 75: All connection attempts failed
2025-02-28 14:11:28,310 - main - INFO - [2025-02-28 14:11:28 TP0] Overlap scheduler is disabled for multimodal models.

Akihirudotcom · 2025-02-28T15:21:29Z

i have same issues, any solution?

husaynirfan1 · 2025-02-28T16:33:44Z

Same issues here. Tested with RTX3090 24GB.

haydn-jones · 2025-02-28T16:57:03Z

SGLang needs to download the model weights and set it up, thats whats happening in the background. The warnings are from olmocr waiting for that setup to finish. Presumably if you wait long enough it will connect.

jakep-allenai · 2025-02-28T17:02:21Z

Yeah, please wait longer for the download and init of the weights from hugging face and for sglang to init. It can take 2-3 minutes on a cold start.

husaynirfan1 · 2025-02-28T18:35:44Z

After waiting, I got this error.

Hardware : RTX4090 24GB

I just follow the installation setup and run using -

python -m olmocr.pipeline ./localworkspace --pdfs pdff/single.pdf

Error stack:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/olmocr/olmocr/pipeline.py", line 1064, in <module>
    asyncio.run(main())
  File "/root/anaconda3/envs/olmocr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/olmocr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/olmocr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/workspace/olmocr/olmocr/pipeline.py", line 1042, in main
    await sglang_server_ready()
  File "/workspace/olmocr/olmocr/pipeline.py", line 649, in sglang_server_ready
    raise Exception("sglang server did not become ready after waiting.")
Exception: sglang server did not become ready after waiting.

jakep-allenai · 2025-02-28T19:43:40Z

Random question, can you run this python code to "predownload" the model into your hugging face cache, and then restart?

import torch
import base64
import urllib.request

from io import BytesIO
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration


# Initialize the model
model = Qwen2VLForConditionalGeneration.from_pretrained("allenai/olmOCR-7B-0225-preview", torch_dtype=torch.bfloat16).eval()
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

keithjohngates · 2025-03-01T10:24:36Z

Waiting caused a timeout...

so.... pre-loading the model as suggested...

Then re-running , worked a charm..
Thanks everyone for the help.
esp @jakep-allenai

keithjohngates added the bug Something isn't working label Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue with RTX A6000 execution #55

issue with RTX A6000 execution #55

keithjohngates commented Feb 28, 2025 •

edited

Loading

erichan1986 commented Feb 28, 2025

Akihirudotcom commented Feb 28, 2025

husaynirfan1 commented Feb 28, 2025

haydn-jones commented Feb 28, 2025

jakep-allenai commented Feb 28, 2025

husaynirfan1 commented Feb 28, 2025

jakep-allenai commented Feb 28, 2025

keithjohngates commented Mar 1, 2025

issue with RTX A6000 execution #55

issue with RTX A6000 execution #55

Comments

keithjohngates commented Feb 28, 2025 • edited Loading

🐛 Describe the bug

Versions

erichan1986 commented Feb 28, 2025

Akihirudotcom commented Feb 28, 2025

husaynirfan1 commented Feb 28, 2025

haydn-jones commented Feb 28, 2025

jakep-allenai commented Feb 28, 2025

husaynirfan1 commented Feb 28, 2025

jakep-allenai commented Feb 28, 2025

keithjohngates commented Mar 1, 2025

keithjohngates commented Feb 28, 2025 •

edited

Loading