-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
提取错误 #1029
Comments
命令行功功能可以正常使用吗? |
正常 有几个warning File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 973, in _bootstrap File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 1016, in _bootstrap_inner File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 953, in run File "C:\Users\yangt.conda\envs\MinerU\lib\socketserver.py", line 683, in process_request_thread File "C:\Users\yangt.conda\envs\MinerU\lib\socketserver.py", line 360, in finish_request File "C:\Users\yangt.conda\envs\MinerU\lib\socketserver.py", line 747, in init File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\werkzeug\serving.py", line 398, in handle File "C:\Users\yangt.conda\envs\MinerU\lib\http\server.py", line 433, in handle File "C:\Users\yangt.conda\envs\MinerU\lib\http\server.py", line 421, in handle_one_request File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\werkzeug\serving.py", line 370, in run_wsgi File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\werkzeug\serving.py", line 331, in execute File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\app.py", line 1536, in call File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\app.py", line 1511, in wsgi_app
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\app.py", line 902, in dispatch_request File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask_restful_init_.py", line 489, in wrapper File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\views.py", line 110, in view File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask_restful_init_.py", line 604, in dispatch_request File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\analysis_view.py", line 38, in get AttributeError: 'NoneType' object has no attribute 'status' 然后我就没办法删除这个pdf了,就已知这样。 |
测试运行的时候第二个有warning我不知道有没有影响 sys.platform win32 PyTorch built with:
[11/20 23:25:52 detectron2]: Command line arguments: {'config_file': 'C:\Users\yangt\.conda\envs\MinerU\lib\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml', 'resume': False, 'eval_only': False, 'num_gpus': 1, 'num_machines': 1, 'machine_rank': 0, 'dist_url': 'tcp://127.0.0.1:57823', 'opts': ['MODEL.WEIGHTS', 'C:\Users\yangt\.cache\modelscope\hub\opendatalab\PDF-Extract-Kit-1___0/models\Layout/LayoutLMv3/model_final.pth']}
[11/20 23:25:53 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from C:\Users\yangt.cache\modelscope\hub\opendatalab\PDF-Extract-Kit-1___0/models\Layout/LayoutLMv3/model_final.pth ... |
Description of the bug | 错误描述
完成本地部署后提取文档出现错误
2024-11-19 19:01:46.298 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 0, last_page_cost_time: 0.0
2024-11-19 19:01:48.628 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 1, last_page_cost_time: 2.33
2024-11-19 19:01:48.748 | ERROR | api.analysis.pdf_ext:analysis_pdf:50 - Traceback (most recent call last):
File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\pdf_ext.py", line 42, in analysis_pdf
pipe.pipe_parse()
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 50, in pipe_parse
self.pdf_mid_data = parse_ocr_pdf(self.pdf_bytes, self.model_list, self.image_writer,
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\user_api.py", line 59, in parse_ocr_pdf
pdf_info_dict = parse_pdf_by_ocr(
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pdf_parse_by_ocr.py", line 14, in parse_pdf_by_ocr
return pdf_parse_union(dataset,
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pdf_parse_union_core_v2.py", line 654, in pdf_parse_union
page_info = parse_page_core(
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pdf_parse_union_core_v2.py", line 541, in parse_page_core
spans = ocr_cut_image_and_table(
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pre_proc\cut_image.py", line 17, in ocr_cut_image_and_table
span['image_path'] = cut_image(span['bbox'], page_id, page, return_path=return_path('images'),
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\libs\pdf_image_tools.py", line 31, in cut_image
imageWriter.write(byte_data, img_hash256_path, AbsReaderWriter.MODE_BIN)
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\rw\DiskReaderWriter.py", line 41, in write
with open(abspath, "wb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\static/analysis_pdf/3b547652aeedd05cbcb1249efe2ebcb3405844486675188b4c1ad17f9517536d1732014031_Multipath_chirp_signal_detection_based_on_biorthogonal_fourier_transform/images\43feeaa3f1f8c0a362ccbff693581ce943d1e106eb396cdcea4eeabef0e37f71.jpg'
2024-11-19 19:01:48.751 | ERROR | api.analysis.pdf_ext:analysis_pdf_task:134 - Traceback (most recent call last):
File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\pdf_ext.py", line 94, in analysis_pdf_task
md_content, bbox_info = analysis_pdf(image_url_prefix, image_dir, pdf_bytes, is_ocr)
TypeError: cannot unpack non-iterable NoneType object
2024-11-19 19:01:48.779 | INFO | api.analysis.pdf_ext:analysis_pdf_task:167 - all task finished!
Exception in thread Thread-10 (analysis_pdf_task):
Traceback (most recent call last):
File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\pdf_ext.py", line 94, in analysis_pdf_task
md_content, bbox_info = analysis_pdf(image_url_prefix, image_dir, pdf_bytes, is_ocr)
TypeError: cannot unpack non-iterable NoneType object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\pdf_ext.py", line 144, in analysis_pdf_task
raise ApiException(code=500, msg="PDF parsing failed", msgZH="pdf解析失败")
common.error_types.ApiException: 500 Internal Server Error: PDF parsing failed
How to reproduce the bug | 如何复现
本地环境 Environment info:
sys.platform win32
Python 3.10.15 | packaged by Anaconda, Inc. | (main, Oct 3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)]
numpy 1.26.3
detectron2 0.6 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\detectron2
Compiler MSVC 194033811
CUDA compiler not available
DETECTRON2_ENV_MODULE
PyTorch 2.3.1+cu118 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA GeForce RTX 4060 Laptop GPU (arch=8.9)
Driver version 551.76
CUDA_HOME C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3
Pillow 10.2.0
torchvision 0.18.1+cu118 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torchvision
torchvision arch flags C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torchvision_C.pyd; cannot find cuobjdump
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.6.0
PyTorch built with:
[11/19 19:00:52 detectron2]: Command line arguments: {'config_file': 'C:\Users\yangt\.conda\envs\MinerU\lib\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml', 'resume': False, 'eval_only': False, 'num_gpus': 1, 'num_machines': 1, 'machine_rank': 0, 'dist_url': 'tcp://127.0.0.1:57823', 'opts': ['MODEL.WEIGHTS', 'C:\Users\yangt\.cache\modelscope\hub\opendatalab\PDF-Extract-Kit-1___0/models\Layout/LayoutLMv3/model_final.pth']}
[11/19 19:00:52 detectron2]: Contents of args.config_file=C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml:
AUG:
DETR: true
CACHE_DIR: ~/cache/huggingface
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: false
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:
TRAIN:
GLOBAL:
HACK: 1.0
ICDAR_DATA_DIR_TEST: ''
ICDAR_DATA_DIR_TRAIN: ''
INPUT:
CROP:
ENABLED: true
SIZE:
TYPE: absolute_range
FORMAT: RGB
MASK_FORMAT: polygon
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN:
MIN_SIZE_TRAIN_SAMPLING: choice
RANDOM_FLIP: horizontal
MODEL:
ANCHOR_GENERATOR:
ANGLES:
ASPECT_RATIOS:
NAME: DefaultAnchorGenerator
OFFSET: 0.0
SIZES:
BACKBONE:
FREEZE_AT: 2
NAME: build_vit_fpn_backbone
CONFIG_PATH: ''
DEVICE: cuda
FPN:
FUSE_TYPE: sum
IN_FEATURES:
NORM: ''
OUT_CHANNELS: 256
IMAGE_ONLY: true
KEYPOINT_ON: false
LOAD_PROPOSALS: false
MASK_ON: true
META_ARCHITECTURE: VLGeneralizedRCNN
PANOPTIC_FPN:
COMBINE:
ENABLED: true
INSTANCES_CONFIDENCE_THRESH: 0.5
OVERLAP_THRESH: 0.5
STUFF_AREA_LIMIT: 4096
INSTANCE_LOSS_WEIGHT: 1.0
PIXEL_MEAN:
PIXEL_STD:
PROPOSAL_GENERATOR:
MIN_SIZE: 0
NAME: RPN
RESNETS:
DEFORM_MODULATED: false
DEFORM_NUM_GROUPS: 1
DEFORM_ON_PER_STAGE:
DEPTH: 50
NORM: FrozenBN
NUM_GROUPS: 1
OUT_FEATURES:
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: true
WIDTH_PER_GROUP: 64
RETINANET:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_WEIGHTS:
FOCAL_LOSS_ALPHA: 0.25
FOCAL_LOSS_GAMMA: 2.0
IN_FEATURES:
IOU_LABELS:
IOU_THRESHOLDS:
NMS_THRESH_TEST: 0.5
NORM: ''
NUM_CLASSES: 10
NUM_CONVS: 4
PRIOR_PROB: 0.01
SCORE_THRESH_TEST: 0.05
SMOOTH_L1_LOSS_BETA: 0.1
TOPK_CANDIDATES_TEST: 1000
ROI_BOX_CASCADE_HEAD:
BBOX_REG_WEIGHTS:
IOUS:
ROI_BOX_HEAD:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS:
CLS_AGNOSTIC_BBOX_REG: true
CONV_DIM: 256
FC_DIM: 1024
NAME: FastRCNNConvFCHead
NORM: ''
NUM_CONV: 0
NUM_FC: 2
POOLER_RESOLUTION: 7
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
SMOOTH_L1_BETA: 0.0
TRAIN_ON_PRED_BOXES: false
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
IN_FEATURES:
IOU_LABELS:
IOU_THRESHOLDS:
NAME: CascadeROIHeads
NMS_THRESH_TEST: 0.5
NUM_CLASSES: 10
POSITIVE_FRACTION: 0.25
PROPOSAL_APPEND_GT: true
SCORE_THRESH_TEST: 0.05
ROI_KEYPOINT_HEAD:
CONV_DIMS:
LOSS_WEIGHT: 1.0
MIN_KEYPOINTS_PER_IMAGE: 1
NAME: KRCNNConvDeconvUpsampleHead
NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
NUM_KEYPOINTS: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
ROI_MASK_HEAD:
CLS_AGNOSTIC_MASK: false
CONV_DIM: 256
NAME: MaskRCNNConvUpsampleHead
NORM: ''
NUM_CONV: 4
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
RPN:
BATCH_SIZE_PER_IMAGE: 256
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS:
BOUNDARY_THRESH: -1
CONV_DIMS:
HEAD_NAME: StandardRPNHead
IN_FEATURES:
IOU_LABELS:
IOU_THRESHOLDS:
LOSS_WEIGHT: 1.0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOPK_TEST: 1000
POST_NMS_TOPK_TRAIN: 2000
PRE_NMS_TOPK_TEST: 1000
PRE_NMS_TOPK_TRAIN: 2000
SMOOTH_L1_BETA: 0.0
SEM_SEG_HEAD:
COMMON_STRIDE: 4
CONVS_DIM: 128
IGNORE_VALUE: 255
IN_FEATURES:
LOSS_WEIGHT: 1.0
NAME: SemSegFPNHead
NORM: GN
NUM_CLASSES: 10
VIT:
DROP_PATH: 0.1
IMG_SIZE:
NAME: layoutlmv3_base
OUT_FEATURES:
POS_TYPE: abs
WEIGHTS:
OUTPUT_DIR:
SCIHUB_DATA_DIR_TRAIN: ~/publaynet/layout_scihub/train
SEED: 42
SOLVER:
AMP:
ENABLED: true
BACKBONE_MULTIPLIER: 1.0
BASE_LR: 0.0002
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 2000
CLIP_GRADIENTS:
CLIP_TYPE: full_model
CLIP_VALUE: 1.0
ENABLED: true
NORM_TYPE: 2.0
GAMMA: 0.1
GRADIENT_ACCUMULATION_STEPS: 1
IMS_PER_BATCH: 32
LR_SCHEDULER_NAME: WarmupCosineLR
MAX_ITER: 20000
MOMENTUM: 0.9
NESTEROV: false
OPTIMIZER: ADAMW
REFERENCE_WORLD_SIZE: 0
STEPS:
WARMUP_FACTOR: 0.01
WARMUP_ITERS: 333
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.05
WEIGHT_DECAY_BIAS: null
WEIGHT_DECAY_NORM: 0.0
TEST:
AUG:
ENABLED: false
FLIP: true
MAX_SIZE: 4000
MIN_SIZES:
DETECTIONS_PER_IMAGE: 100
EVAL_PERIOD: 1000
EXPECTED_RESULTS: []
KEYPOINT_OKS_SIGMAS: []
PRECISE_BN:
ENABLED: false
NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
Operating system | 操作系统
Windows
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.9.x
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: