Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with RTX A6000 execution #55

Open
keithjohngates opened this issue Feb 28, 2025 · 8 comments
Open

issue with RTX A6000 execution #55

keithjohngates opened this issue Feb 28, 2025 · 8 comments
Labels
bug Something isn't working

Comments

@keithjohngates
Copy link

keithjohngates commented Feb 28, 2025

🐛 Describe the bug

I am using:

nvidia RTX A6000 48GB

Followed the instructions carefully, all seemed to install and be fine.

CUDA_DEVICE_ORDER=PCI_BUS_ID python -m olmocr.pipeline ./localworkspace --pdfs /media/pop/samsung256/x64_gsqld_report_files/e408bd57-03eb-4d08-b92c-ab7bf632cfca/cr_100468_7.pdf

Any ideas???

something to do with sglang ?? not installing - although it seems to be in the conda env, but not running properly ??

Gives the following error:

(olmocr) pop@pop-os:~/Documents/olmocr$ CUDA_DEVICE_ORDER=PCI_BUS_ID python -m olmocr.pipeline ./localworkspace --pdfs /media/pop/samsung256/x64_gsqld_report_files/e408bd57-03eb-4d08-b92c-ab7bf632cfca/cr_100468_7.pdf
INFO:olmocr.check:pdftoppm is installed and working.
2025-02-28 13:00:42,979 - main - INFO - Got --pdfs argument, going to add to the work queue
2025-02-28 13:00:42,979 - main - INFO - Loading file at /media/pop/samsung256/x64_gsqld_report_files/e408bd57-03eb-4d08-b92c-ab7bf632cfca/cr_100468_7.pdf as PDF document
2025-02-28 13:00:42,979 - main - INFO - Found 1 total pdf paths to add
Sampling PDFs to calculate optimal length: 100%|███████████████| 1/1 [00:00<00:00, 178.34it/s]
2025-02-28 13:00:42,985 - main - INFO - Calculated items_per_group: 33 based on average pages per PDF: 15.00
INFO:olmocr.work_queue:Found 1 total paths
INFO:olmocr.work_queue:0 new paths to add to the workspace
2025-02-28 13:00:43,106 - main - INFO - Starting pipeline with PID 66979
INFO:olmocr.work_queue:Initialized local queue with 1 work items
2025-02-28 13:00:43,168 - main - WARNING - Attempt 1: All connection attempts failed
2025-02-28 13:00:44,193 - main - WARNING - Attempt 2: All connection attempts failed
2025-02-28 13:00:45,228 - main - WARNING - Attempt 3: All connection attempts failed
2025-02-28 13:00:46,273 - main - WARNING - Attempt 4: All connection attempts failed
2025-02-28 13:00:47,299 - main - WARNING - Attempt 5: All connection attempts failed
2025-02-28 13:00:48,346 - main - WARNING - Attempt 6: All connection attempts failed
2025-02-28 13:00:48,469 - main - INFO - [2025-02-28 13:00:48] server_args=ServerArgs(model_path='allenai/olmOCR-7B-0225-preview', tokenizer_path='allenai/olmOCR-7B-0225-preview', tokenizer_mode='auto', load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='allenai/olmOCR-7B-0225-preview', chat_template='qwen2-vl', is_embedding=False, revision=None, skip_tokenizer_init=False, host='127.0.0.1', port=30024, mem_fraction_static=0.8, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=1, stream_interval=1, stream_output=False, random_seed=136363370, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http='warning', log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None)
2025-02-28 13:00:49,387 - main - WARNING - Attempt 7: All connection attempts failed
2025-02-28 13:00:50,094 - main - INFO - Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
2025-02-28 13:00:50,544 - main - WARNING - Attempt 8: All connection attempts failed
2025-02-28 13:00:51,567 - main - WARNING - Attempt 9: All connection attempts failed
2025-02-28 13:00:51,900 - main - INFO - [2025-02-28 13:00:51] Use chat template for the OpenAI-compatible API server: qwen2-vl
2025-02-28 13:00:52,614 - main - WARNING - Attempt 10: All connection attempts failed
2025-02-28 13:00:53,637 - main - WARNING - Attempt 11: All connection attempts failed
2025-02-28 13:00:54,660 - main - WARNING - Attempt 12: All connection attempts failed
2025-02-28 13:00:55,683 - main - WARNING - Attempt 13: All connection attempts failed

Versions

hope this is something obvious !!!

@keithjohngates keithjohngates added the bug Something isn't working label Feb 28, 2025
@erichan1986
Copy link

same issues

2025-02-28 14:11:17,944 - main - WARNING - Attempt 66: All connection attempts failed
2025-02-28 14:11:18,978 - main - WARNING - Attempt 67: All connection attempts failed
2025-02-28 14:11:20,038 - main - WARNING - Attempt 68: All connection attempts failed
2025-02-28 14:11:21,094 - main - WARNING - Attempt 69: All connection attempts failed
2025-02-28 14:11:22,155 - main - WARNING - Attempt 70: All connection attempts failed
2025-02-28 14:11:23,190 - main - WARNING - Attempt 71: All connection attempts failed
2025-02-28 14:11:24,224 - main - WARNING - Attempt 72: All connection attempts failed
2025-02-28 14:11:25,259 - main - WARNING - Attempt 73: All connection attempts failed
2025-02-28 14:11:25,980 - main - INFO - Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
2025-02-28 14:11:26,293 - main - WARNING - Attempt 74: All connection attempts failed
2025-02-28 14:11:27,327 - main - WARNING - Attempt 75: All connection attempts failed
2025-02-28 14:11:28,310 - main - INFO - [2025-02-28 14:11:28 TP0] Overlap scheduler is disabled for multimodal models.

@Akihirudotcom
Copy link

i have same issues, any solution?

@husaynirfan1
Copy link

Same issues here. Tested with RTX3090 24GB.

@haydn-jones
Copy link

SGLang needs to download the model weights and set it up, thats whats happening in the background. The warnings are from olmocr waiting for that setup to finish. Presumably if you wait long enough it will connect.

@jakep-allenai
Copy link
Collaborator

Yeah, please wait longer for the download and init of the weights from hugging face and for sglang to init. It can take 2-3 minutes on a cold start.

@husaynirfan1
Copy link

After waiting, I got this error.

Hardware : RTX4090 24GB

I just follow the installation setup and run using -

python -m olmocr.pipeline ./localworkspace --pdfs pdff/single.pdf

Error stack:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/olmocr/olmocr/pipeline.py", line 1064, in <module>
    asyncio.run(main())
  File "/root/anaconda3/envs/olmocr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/olmocr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/olmocr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/workspace/olmocr/olmocr/pipeline.py", line 1042, in main
    await sglang_server_ready()
  File "/workspace/olmocr/olmocr/pipeline.py", line 649, in sglang_server_ready
    raise Exception("sglang server did not become ready after waiting.")
Exception: sglang server did not become ready after waiting.

@jakep-allenai
Copy link
Collaborator

Random question, can you run this python code to "predownload" the model into your hugging face cache, and then restart?

import torch
import base64
import urllib.request

from io import BytesIO
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration


# Initialize the model
model = Qwen2VLForConditionalGeneration.from_pretrained("allenai/olmOCR-7B-0225-preview", torch_dtype=torch.bfloat16).eval()
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

@keithjohngates
Copy link
Author

Waiting caused a timeout...

so.... pre-loading the model as suggested...

Then re-running , worked a charm..
Thanks everyone for the help.
esp @jakep-allenai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants