Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

提取错误 #1029

Open
YANGtzeRi opened this issue Nov 19, 2024 · 3 comments
Open

提取错误 #1029

YANGtzeRi opened this issue Nov 19, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@YANGtzeRi
Copy link

Description of the bug | 错误描述

完成本地部署后提取文档出现错误
2024-11-19 19:01:46.298 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 0, last_page_cost_time: 0.0
2024-11-19 19:01:48.628 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 1, last_page_cost_time: 2.33
2024-11-19 19:01:48.748 | ERROR | api.analysis.pdf_ext:analysis_pdf:50 - Traceback (most recent call last):
File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\pdf_ext.py", line 42, in analysis_pdf
pipe.pipe_parse()
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pipe\UNIPipe.py", line 50, in pipe_parse
self.pdf_mid_data = parse_ocr_pdf(self.pdf_bytes, self.model_list, self.image_writer,
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\user_api.py", line 59, in parse_ocr_pdf
pdf_info_dict = parse_pdf_by_ocr(
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pdf_parse_by_ocr.py", line 14, in parse_pdf_by_ocr
return pdf_parse_union(dataset,
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pdf_parse_union_core_v2.py", line 654, in pdf_parse_union
page_info = parse_page_core(
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pdf_parse_union_core_v2.py", line 541, in parse_page_core
spans = ocr_cut_image_and_table(
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\pre_proc\cut_image.py", line 17, in ocr_cut_image_and_table
span['image_path'] = cut_image(span['bbox'], page_id, page, return_path=return_path('images'),
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\libs\pdf_image_tools.py", line 31, in cut_image
imageWriter.write(byte_data, img_hash256_path, AbsReaderWriter.MODE_BIN)
File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\rw\DiskReaderWriter.py", line 41, in write
with open(abspath, "wb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\static/analysis_pdf/3b547652aeedd05cbcb1249efe2ebcb3405844486675188b4c1ad17f9517536d1732014031_Multipath_chirp_signal_detection_based_on_biorthogonal_fourier_transform/images\43feeaa3f1f8c0a362ccbff693581ce943d1e106eb396cdcea4eeabef0e37f71.jpg'

2024-11-19 19:01:48.751 | ERROR | api.analysis.pdf_ext:analysis_pdf_task:134 - Traceback (most recent call last):
File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\pdf_ext.py", line 94, in analysis_pdf_task
md_content, bbox_info = analysis_pdf(image_url_prefix, image_dir, pdf_bytes, is_ocr)
TypeError: cannot unpack non-iterable NoneType object

2024-11-19 19:01:48.779 | INFO | api.analysis.pdf_ext:analysis_pdf_task:167 - all task finished!
Exception in thread Thread-10 (analysis_pdf_task):
Traceback (most recent call last):
File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\pdf_ext.py", line 94, in analysis_pdf_task
md_content, bbox_info = analysis_pdf(image_url_prefix, image_dir, pdf_bytes, is_ocr)
TypeError: cannot unpack non-iterable NoneType object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\pdf_ext.py", line 144, in analysis_pdf_task
raise ApiException(code=500, msg="PDF parsing failed", msgZH="pdf解析失败")
common.error_types.ApiException: 500 Internal Server Error: PDF parsing failed

How to reproduce the bug | 如何复现

本地环境 Environment info:


sys.platform win32
Python 3.10.15 | packaged by Anaconda, Inc. | (main, Oct 3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)]
numpy 1.26.3
detectron2 0.6 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\detectron2
Compiler MSVC 194033811
CUDA compiler not available
DETECTRON2_ENV_MODULE
PyTorch 2.3.1+cu118 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA GeForce RTX 4060 Laptop GPU (arch=8.9)
Driver version 551.76
CUDA_HOME C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3
Pillow 10.2.0
torchvision 0.18.1+cu118 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torchvision
torchvision arch flags C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torchvision_C.pyd; cannot find cuobjdump
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.6.0


PyTorch built with:

  • C++ Version: 201703
  • MSVC 192930154
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  • OpenMP 2019
  • LAPACK is enabled (usually provided by MKL)
  • CPU capability usage: AVX2
  • CUDA Runtime 11.8
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.7
  • Magma 2.5.4
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

[11/19 19:00:52 detectron2]: Command line arguments: {'config_file': 'C:\Users\yangt\.conda\envs\MinerU\lib\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml', 'resume': False, 'eval_only': False, 'num_gpus': 1, 'num_machines': 1, 'machine_rank': 0, 'dist_url': 'tcp://127.0.0.1:57823', 'opts': ['MODEL.WEIGHTS', 'C:\Users\yangt\.cache\modelscope\hub\opendatalab\PDF-Extract-Kit-1___0/models\Layout/LayoutLMv3/model_final.pth']}
[11/19 19:00:52 detectron2]: Contents of args.config_file=C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml:
AUG:
DETR: true
CACHE_DIR: ~/cache/huggingface
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: false
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:

  • scihub_train
    TRAIN:
  • scihub_train
    GLOBAL:
    HACK: 1.0
    ICDAR_DATA_DIR_TEST: ''
    ICDAR_DATA_DIR_TRAIN: ''
    INPUT:
    CROP:
    ENABLED: true
    SIZE:
    • 384
    • 600
      TYPE: absolute_range
      FORMAT: RGB
      MASK_FORMAT: polygon
      MAX_SIZE_TEST: 1333
      MAX_SIZE_TRAIN: 1333
      MIN_SIZE_TEST: 800
      MIN_SIZE_TRAIN:
  • 480
  • 512
  • 544
  • 576
  • 608
  • 640
  • 672
  • 704
  • 736
  • 768
  • 800
    MIN_SIZE_TRAIN_SAMPLING: choice
    RANDOM_FLIP: horizontal
    MODEL:
    ANCHOR_GENERATOR:
    ANGLES:
      • -90
      • 0
      • 90
        ASPECT_RATIOS:
      • 0.5
      • 1.0
      • 2.0
        NAME: DefaultAnchorGenerator
        OFFSET: 0.0
        SIZES:
      • 32
      • 64
      • 128
      • 256
      • 512
        BACKBONE:
        FREEZE_AT: 2
        NAME: build_vit_fpn_backbone
        CONFIG_PATH: ''
        DEVICE: cuda
        FPN:
        FUSE_TYPE: sum
        IN_FEATURES:
    • layer3
    • layer5
    • layer7
    • layer11
      NORM: ''
      OUT_CHANNELS: 256
      IMAGE_ONLY: true
      KEYPOINT_ON: false
      LOAD_PROPOSALS: false
      MASK_ON: true
      META_ARCHITECTURE: VLGeneralizedRCNN
      PANOPTIC_FPN:
      COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
      INSTANCE_LOSS_WEIGHT: 1.0
      PIXEL_MEAN:
  • 127.5
  • 127.5
  • 127.5
    PIXEL_STD:
  • 127.5
  • 127.5
  • 127.5
    PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
    RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    • false
    • false
    • false
    • false
      DEPTH: 50
      NORM: FrozenBN
      NUM_GROUPS: 1
      OUT_FEATURES:
    • res4
      RES2_OUT_CHANNELS: 256
      RES5_DILATION: 1
      STEM_OUT_CHANNELS: 64
      STRIDE_IN_1X1: true
      WIDTH_PER_GROUP: 64
      RETINANET:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_WEIGHTS:
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      FOCAL_LOSS_ALPHA: 0.25
      FOCAL_LOSS_GAMMA: 2.0
      IN_FEATURES:
    • p3
    • p4
    • p5
    • p6
    • p7
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.4
    • 0.5
      NMS_THRESH_TEST: 0.5
      NORM: ''
      NUM_CLASSES: 10
      NUM_CONVS: 4
      PRIOR_PROB: 0.01
      SCORE_THRESH_TEST: 0.05
      SMOOTH_L1_LOSS_BETA: 0.1
      TOPK_CANDIDATES_TEST: 1000
      ROI_BOX_CASCADE_HEAD:
      BBOX_REG_WEIGHTS:
      • 10.0
      • 10.0
      • 5.0
      • 5.0
      • 20.0
      • 20.0
      • 10.0
      • 10.0
      • 30.0
      • 30.0
      • 15.0
      • 15.0
        IOUS:
    • 0.5
    • 0.6
    • 0.7
      ROI_BOX_HEAD:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS:
    • 10.0
    • 10.0
    • 5.0
    • 5.0
      CLS_AGNOSTIC_BBOX_REG: true
      CONV_DIM: 256
      FC_DIM: 1024
      NAME: FastRCNNConvFCHead
      NORM: ''
      NUM_CONV: 0
      NUM_FC: 2
      POOLER_RESOLUTION: 7
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      SMOOTH_L1_BETA: 0.0
      TRAIN_ON_PRED_BOXES: false
      ROI_HEADS:
      BATCH_SIZE_PER_IMAGE: 512
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      IOU_LABELS:
    • 0
    • 1
      IOU_THRESHOLDS:
    • 0.5
      NAME: CascadeROIHeads
      NMS_THRESH_TEST: 0.5
      NUM_CLASSES: 10
      POSITIVE_FRACTION: 0.25
      PROPOSAL_APPEND_GT: true
      SCORE_THRESH_TEST: 0.05
      ROI_KEYPOINT_HEAD:
      CONV_DIMS:
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
      LOSS_WEIGHT: 1.0
      MIN_KEYPOINTS_PER_IMAGE: 1
      NAME: KRCNNConvDeconvUpsampleHead
      NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
      NUM_KEYPOINTS: 17
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      ROI_MASK_HEAD:
      CLS_AGNOSTIC_MASK: false
      CONV_DIM: 256
      NAME: MaskRCNNConvUpsampleHead
      NORM: ''
      NUM_CONV: 4
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      RPN:
      BATCH_SIZE_PER_IMAGE: 256
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS:
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      BOUNDARY_THRESH: -1
      CONV_DIMS:
    • -1
      HEAD_NAME: StandardRPNHead
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
    • p6
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.3
    • 0.7
      LOSS_WEIGHT: 1.0
      NMS_THRESH: 0.7
      POSITIVE_FRACTION: 0.5
      POST_NMS_TOPK_TEST: 1000
      POST_NMS_TOPK_TRAIN: 2000
      PRE_NMS_TOPK_TEST: 1000
      PRE_NMS_TOPK_TRAIN: 2000
      SMOOTH_L1_BETA: 0.0
      SEM_SEG_HEAD:
      COMMON_STRIDE: 4
      CONVS_DIM: 128
      IGNORE_VALUE: 255
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      LOSS_WEIGHT: 1.0
      NAME: SemSegFPNHead
      NORM: GN
      NUM_CLASSES: 10
      VIT:
      DROP_PATH: 0.1
      IMG_SIZE:
    • 224
    • 224
      NAME: layoutlmv3_base
      OUT_FEATURES:
    • layer3
    • layer5
    • layer7
    • layer11
      POS_TYPE: abs
      WEIGHTS:
      OUTPUT_DIR:
      SCIHUB_DATA_DIR_TRAIN: ~/publaynet/layout_scihub/train
      SEED: 42
      SOLVER:
      AMP:
      ENABLED: true
      BACKBONE_MULTIPLIER: 1.0
      BASE_LR: 0.0002
      BIAS_LR_FACTOR: 1.0
      CHECKPOINT_PERIOD: 2000
      CLIP_GRADIENTS:
      CLIP_TYPE: full_model
      CLIP_VALUE: 1.0
      ENABLED: true
      NORM_TYPE: 2.0
      GAMMA: 0.1
      GRADIENT_ACCUMULATION_STEPS: 1
      IMS_PER_BATCH: 32
      LR_SCHEDULER_NAME: WarmupCosineLR
      MAX_ITER: 20000
      MOMENTUM: 0.9
      NESTEROV: false
      OPTIMIZER: ADAMW
      REFERENCE_WORLD_SIZE: 0
      STEPS:
  • 10000
    WARMUP_FACTOR: 0.01
    WARMUP_ITERS: 333
    WARMUP_METHOD: linear
    WEIGHT_DECAY: 0.05
    WEIGHT_DECAY_BIAS: null
    WEIGHT_DECAY_NORM: 0.0
    TEST:
    AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    • 400
    • 500
    • 600
    • 700
    • 800
    • 900
    • 1000
    • 1100
    • 1200
      DETECTIONS_PER_IMAGE: 100
      EVAL_PERIOD: 1000
      EXPECTED_RESULTS: []
      KEYPOINT_OKS_SIGMAS: []
      PRECISE_BN:
      ENABLED: false
      NUM_ITER: 200
      VERSION: 2
      VIS_PERIOD: 0

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

@YANGtzeRi YANGtzeRi added the bug Something isn't working label Nov 19, 2024
@myhloli
Copy link
Collaborator

myhloli commented Nov 19, 2024

命令行功功能可以正常使用吗?

@YANGtzeRi
Copy link
Author

正常 有几个warning
但是运行提取就出现这些错误
Press CTRL+C to quit
127.0.0.1 - - [20/Nov/2024 23:17:22] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [20/Nov/2024 23:17:22] "GET /assets/index-DTW6q3XL.css HTTP/1.1" 304 -
127.0.0.1 - - [20/Nov/2024 23:17:22] "GET /assets/index-B7sj4yiQ.js HTTP/1.1" 304 -
127.0.0.1 - - [20/Nov/2024 23:17:22] "GET /iconfont.js HTTP/1.1" 304 -
127.0.0.1 - - [20/Nov/2024 23:17:23] "GET /assets/logo-o6JtYmb9.svg HTTP/1.1" 304 -
127.0.0.1 - - [20/Nov/2024 23:17:23] "GET /api/v2/extract/list?pageNo=1&pageSize=100 HTTP/1.1" 200 -
127.0.0.1 - - [20/Nov/2024 23:17:23] "GET /assets/pdf-upload-V5jR-ID_.png HTTP/1.1" 304 -
127.0.0.1 - - [20/Nov/2024 23:17:23] "GET /logo.svg HTTP/1.1" 200 -
2024-11-20 23:17:26.605 | ERROR | api.extentions:handle_error:33 - An error occurred: 'NoneType' object has no attribute 'status'
Traceback (most recent call last):

File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 973, in _bootstrap
self._bootstrap_inner()
│ └ <function Thread._bootstrap_inner at 0x00000233F4D983A0>
└ <Thread(Thread-9 (process_request_thread), started daemon 11200)>

File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
│ └ <function Thread.run at 0x00000233F4D980D0>
└ <Thread(Thread-9 (process_request_thread), started daemon 11200)>

File "C:\Users\yangt.conda\envs\MinerU\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └ <Thread(Thread-9 (process_request_thread), started daemon 11200)>
│ │ │ └ (<socket.socket fd=1604, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 5559), raddr...
│ │ └ <Thread(Thread-9 (process_request_thread), started daemon 11200)>
│ └ <bound method ThreadingMixIn.process_request_thread of <werkzeug.serving.ThreadedWSGIServer object at 0x00000233ACB77AF0>>
└ <Thread(Thread-9 (process_request_thread), started daemon 11200)>

File "C:\Users\yangt.conda\envs\MinerU\lib\socketserver.py", line 683, in process_request_thread
self.finish_request(request, client_address)
│ │ │ └ ('127.0.0.1', 11836)
│ │ └ <socket.socket fd=1604, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 5559), raddr=...
│ └ <function BaseServer.finish_request at 0x00000233F4D99D80>
└ <werkzeug.serving.ThreadedWSGIServer object at 0x00000233ACB77AF0>

File "C:\Users\yangt.conda\envs\MinerU\lib\socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
│ │ │ │ └ <werkzeug.serving.ThreadedWSGIServer object at 0x00000233ACB77AF0>
│ │ │ └ ('127.0.0.1', 11836)
│ │ └ <socket.socket fd=1604, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 5559), raddr=...
│ └ <class 'werkzeug.serving.WSGIRequestHandler'>
└ <werkzeug.serving.ThreadedWSGIServer object at 0x00000233ACB77AF0>

File "C:\Users\yangt.conda\envs\MinerU\lib\socketserver.py", line 747, in init
self.handle()
│ └ <function WSGIRequestHandler.handle at 0x00000233F6BD49D0>
└ <werkzeug.serving.WSGIRequestHandler object at 0x00000233ACBAF6A0>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\werkzeug\serving.py", line 398, in handle
super().handle()

File "C:\Users\yangt.conda\envs\MinerU\lib\http\server.py", line 433, in handle
self.handle_one_request()
│ └ <function BaseHTTPRequestHandler.handle_one_request at 0x00000233F6962290>
└ <werkzeug.serving.WSGIRequestHandler object at 0x00000233ACBAF6A0>

File "C:\Users\yangt.conda\envs\MinerU\lib\http\server.py", line 421, in handle_one_request
method()
└ <bound method WSGIRequestHandler.run_wsgi of <werkzeug.serving.WSGIRequestHandler object at 0x00000233ACBAF6A0>>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\werkzeug\serving.py", line 370, in run_wsgi
execute(self.server.app)
│ │ │ └ <Flask 'api.extentions'>
│ │ └ <werkzeug.serving.ThreadedWSGIServer object at 0x00000233ACB77AF0>
│ └ <werkzeug.serving.WSGIRequestHandler object at 0x00000233ACBAF6A0>
└ <function WSGIRequestHandler.run_wsgi..execute at 0x00000233ACB4BD00>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\werkzeug\serving.py", line 331, in execute
application_iter = app(environ, start_response)
│ │ └ <function WSGIRequestHandler.run_wsgi..start_response at 0x00000233ACB4BC70>
│ └ {'wsgi.version': (1, 0), 'wsgi.url_scheme': 'http', 'wsgi.input': <_io.BufferedReader name=1604>, 'wsgi.errors': <_io.TextIOW...
└ <Flask 'api.extentions'>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\app.py", line 1536, in call
return self.wsgi_app(environ, start_response)
│ │ │ └ <function WSGIRequestHandler.run_wsgi..start_response at 0x00000233ACB4BC70>
│ │ └ {'wsgi.version': (1, 0), 'wsgi.url_scheme': 'http', 'wsgi.input': <_io.BufferedReader name=1604>, 'wsgi.errors': <_io.TextIOW...
│ └ <function Flask.wsgi_app at 0x00000233F7486E60>
└ <Flask 'api.extentions'>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\app.py", line 1511, in wsgi_app
response = self.full_dispatch_request()
│ └ <function Flask.full_dispatch_request at 0x00000233F7486680>
└ <Flask 'api.extentions'>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\app.py", line 917, in full_dispatch_request
rv = self.dispatch_request()
│ └ <function Flask.dispatch_request at 0x00000233F74865F0>
└ <Flask 'api.extentions'>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\app.py", line 902, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
│ │ │ │ │ │ └ {}
│ │ │ │ │ └ 'analysis.analysistaskprogressview'
│ │ │ │ └ <Rule '/api/v2/extract/task/progress' (GET, OPTIONS, HEAD) -> analysis.analysistaskprogressview>
│ │ │ └ {'static': <function Flask.init.. at 0x00000233F90440D0>, 'analysis.uploadpdfview': <function View.as_vie...
│ │ └ <Flask 'api.extentions'>
│ └ <function Flask.ensure_sync at 0x00000233F7486830>
└ <Flask 'api.extentions'>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask_restful_init_.py", line 489, in wrapper
resp = resource(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <function View.as_view..view at 0x00000233ABFB7130>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask\views.py", line 110, in view
return current_app.ensure_sync(self.dispatch_request)(**kwargs) # type: ignore[no-any-return]
│ │ │ └ {}
│ │ └ <function Resource.dispatch_request at 0x00000233F749A560>
│ └ <api.analysis.analysis_view.AnalysisTaskProgressView object at 0x00000233ACBAEC50>
└ <Flask 'api.extentions'>

File "C:\Users\yangt.conda\envs\MinerU\lib\site-packages\flask_restful_init_.py", line 604, in dispatch_request
resp = meth(*args, **kwargs)
│ │ └ {}
│ └ ()
└ <bound method AnalysisTaskProgressView.get of <api.analysis.analysis_view.AnalysisTaskProgressView object at 0x00000233ACBAEC...

File "D:\Yangtze\GitHub\MinerU\projects\web_demo\web_demo\api\analysis\analysis_view.py", line 38, in get
"status": analysis_pdf.status,
└ None

AttributeError: 'NoneType' object has no attribute 'status'
127.0.0.1 - - [20/Nov/2024 23:17:26] "GET /api/v2/extract/task/progress?id=2 HTTP/1.1" 500 -

然后我就没办法删除这个pdf了,就已知这样。

@YANGtzeRi
Copy link
Author

测试运行的时候第二个有warning我不知道有没有影响
2024-11-20 23:25:29.926 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 8, cid_chars_radio: 0.0
2024-11-20 23:25:29.929 | WARNING | magic_pdf.filter.pdf_classify_by_type:classify:334 - pdf is not classified by area and text_len, by_image_area: False, by_text: False, by_avg_words: False, by_img_num: True, by_text_layout: False, by_img_narrow_strips: False, by_invalid_chars: True
import tensorrt_llm failed, if do not use tensorrt, ignore this message
import lmdeploy failed, if do not use lmdeploy, ignore this message
2024-11-20 23:25:44.665 | INFO | magic_pdf.model.pdf_extract_kit:init:68 - DocAnalysis init, this may take some times, layout_model: layoutlmv3, apply_formula: True, apply_ocr: True, apply_table: False, table_model: rapid_table, lang: None
2024-11-20 23:25:44.665 | INFO | magic_pdf.model.pdf_extract_kit:init:77 - using device: cuda
2024-11-20 23:25:44.665 | INFO | magic_pdf.model.pdf_extract_kit:init:79 - using models_dir: C:\Users\yangt.cache\modelscope\hub\opendatalab\PDF-Extract-Kit-1___0/models
CustomVisionEncoderDecoderModel init
VariableUnimerNetModel init
VariableUnimerNetPatchEmbeddings init
VariableUnimerNetModel init
VariableUnimerNetPatchEmbeddings init
CustomMBartForCausalLM init
CustomMBartDecoder init
[11/20 23:25:51 detectron2]: Rank of current process: 0. World size: 1
[11/20 23:25:52 detectron2]: Environment info:


sys.platform win32
Python 3.10.15 | packaged by Anaconda, Inc. | (main, Oct 3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)]
numpy 1.26.3
detectron2 0.6 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\detectron2
Compiler MSVC 194033811
CUDA compiler not available
DETECTRON2_ENV_MODULE
PyTorch 2.3.1+cu118 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA GeForce RTX 4060 Laptop GPU (arch=8.9)
Driver version 551.76
CUDA_HOME C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
Pillow 10.2.0
torchvision 0.18.1+cu118 @C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torchvision
torchvision arch flags C:\Users\yangt.conda\envs\MinerU\lib\site-packages\torchvision_C.pyd; cannot find cuobjdump
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.6.0


PyTorch built with:

  • C++ Version: 201703
  • MSVC 192930154
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  • OpenMP 2019
  • LAPACK is enabled (usually provided by MKL)
  • CPU capability usage: AVX2
  • CUDA Runtime 11.8
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.7
  • Magma 2.5.4
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

[11/20 23:25:52 detectron2]: Command line arguments: {'config_file': 'C:\Users\yangt\.conda\envs\MinerU\lib\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml', 'resume': False, 'eval_only': False, 'num_gpus': 1, 'num_machines': 1, 'machine_rank': 0, 'dist_url': 'tcp://127.0.0.1:57823', 'opts': ['MODEL.WEIGHTS', 'C:\Users\yangt\.cache\modelscope\hub\opendatalab\PDF-Extract-Kit-1___0/models\Layout/LayoutLMv3/model_final.pth']}
[11/20 23:25:52 detectron2]: Contents of args.config_file=C:\Users\yangt.conda\envs\MinerU\lib\site-packages\magic_pdf\resources\model_config\layoutlmv3\layoutlmv3_base_inference.yaml:
AUG:
DETR: true
CACHE_DIR: ~/cache/huggingface
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: false
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:

  • scihub_train
    TRAIN:
  • scihub_train
    GLOBAL:
    HACK: 1.0
    ICDAR_DATA_DIR_TEST: ''
    ICDAR_DATA_DIR_TRAIN: ''
    INPUT:
    CROP:
    ENABLED: true
    SIZE:
    • 384
    • 600
      TYPE: absolute_range
      FORMAT: RGB
      MASK_FORMAT: polygon
      MAX_SIZE_TEST: 1333
      MAX_SIZE_TRAIN: 1333
      MIN_SIZE_TEST: 800
      MIN_SIZE_TRAIN:
  • 480
  • 512
  • 544
  • 576
  • 608
  • 640
  • 672
  • 704
  • 736
  • 768
  • 800
    MIN_SIZE_TRAIN_SAMPLING: choice
    RANDOM_FLIP: horizontal
    MODEL:
    ANCHOR_GENERATOR:
    ANGLES:
      • -90
      • 0
      • 90
        ASPECT_RATIOS:
      • 0.5
      • 1.0
      • 2.0
        NAME: DefaultAnchorGenerator
        OFFSET: 0.0
        SIZES:
      • 32
      • 64
      • 128
      • 256
      • 512
        BACKBONE:
        FREEZE_AT: 2
        NAME: build_vit_fpn_backbone
        CONFIG_PATH: ''
        DEVICE: cuda
        FPN:
        FUSE_TYPE: sum
        IN_FEATURES:
    • layer3
    • layer5
    • layer7
    • layer11
      NORM: ''
      OUT_CHANNELS: 256
      IMAGE_ONLY: true
      KEYPOINT_ON: false
      LOAD_PROPOSALS: false
      MASK_ON: true
      META_ARCHITECTURE: VLGeneralizedRCNN
      PANOPTIC_FPN:
      COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
      INSTANCE_LOSS_WEIGHT: 1.0
      PIXEL_MEAN:
  • 127.5
  • 127.5
  • 127.5
    PIXEL_STD:
  • 127.5
  • 127.5
  • 127.5
    PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
    RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    • false
    • false
    • false
    • false
      DEPTH: 50
      NORM: FrozenBN
      NUM_GROUPS: 1
      OUT_FEATURES:
    • res4
      RES2_OUT_CHANNELS: 256
      RES5_DILATION: 1
      STEM_OUT_CHANNELS: 64
      STRIDE_IN_1X1: true
      WIDTH_PER_GROUP: 64
      RETINANET:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_WEIGHTS:
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      FOCAL_LOSS_ALPHA: 0.25
      FOCAL_LOSS_GAMMA: 2.0
      IN_FEATURES:
    • p3
    • p4
    • p5
    • p6
    • p7
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.4
    • 0.5
      NMS_THRESH_TEST: 0.5
      NORM: ''
      NUM_CLASSES: 10
      NUM_CONVS: 4
      PRIOR_PROB: 0.01
      SCORE_THRESH_TEST: 0.05
      SMOOTH_L1_LOSS_BETA: 0.1
      TOPK_CANDIDATES_TEST: 1000
      ROI_BOX_CASCADE_HEAD:
      BBOX_REG_WEIGHTS:
      • 10.0
      • 10.0
      • 5.0
      • 5.0
      • 20.0
      • 20.0
      • 10.0
      • 10.0
      • 30.0
      • 30.0
      • 15.0
      • 15.0
        IOUS:
    • 0.5
    • 0.6
    • 0.7
      ROI_BOX_HEAD:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS:
    • 10.0
    • 10.0
    • 5.0
    • 5.0
      CLS_AGNOSTIC_BBOX_REG: true
      CONV_DIM: 256
      FC_DIM: 1024
      NAME: FastRCNNConvFCHead
      NORM: ''
      NUM_CONV: 0
      NUM_FC: 2
      POOLER_RESOLUTION: 7
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      SMOOTH_L1_BETA: 0.0
      TRAIN_ON_PRED_BOXES: false
      ROI_HEADS:
      BATCH_SIZE_PER_IMAGE: 512
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      IOU_LABELS:
    • 0
    • 1
      IOU_THRESHOLDS:
    • 0.5
      NAME: CascadeROIHeads
      NMS_THRESH_TEST: 0.5
      NUM_CLASSES: 10
      POSITIVE_FRACTION: 0.25
      PROPOSAL_APPEND_GT: true
      SCORE_THRESH_TEST: 0.05
      ROI_KEYPOINT_HEAD:
      CONV_DIMS:
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
      LOSS_WEIGHT: 1.0
      MIN_KEYPOINTS_PER_IMAGE: 1
      NAME: KRCNNConvDeconvUpsampleHead
      NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
      NUM_KEYPOINTS: 17
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      ROI_MASK_HEAD:
      CLS_AGNOSTIC_MASK: false
      CONV_DIM: 256
      NAME: MaskRCNNConvUpsampleHead
      NORM: ''
      NUM_CONV: 4
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      RPN:
      BATCH_SIZE_PER_IMAGE: 256
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS:
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      BOUNDARY_THRESH: -1
      CONV_DIMS:
    • -1
      HEAD_NAME: StandardRPNHead
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
    • p6
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.3
    • 0.7
      LOSS_WEIGHT: 1.0
      NMS_THRESH: 0.7
      POSITIVE_FRACTION: 0.5
      POST_NMS_TOPK_TEST: 1000
      POST_NMS_TOPK_TRAIN: 2000
      PRE_NMS_TOPK_TEST: 1000
      PRE_NMS_TOPK_TRAIN: 2000
      SMOOTH_L1_BETA: 0.0
      SEM_SEG_HEAD:
      COMMON_STRIDE: 4
      CONVS_DIM: 128
      IGNORE_VALUE: 255
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      LOSS_WEIGHT: 1.0
      NAME: SemSegFPNHead
      NORM: GN
      NUM_CLASSES: 10
      VIT:
      DROP_PATH: 0.1
      IMG_SIZE:
    • 224
    • 224
      NAME: layoutlmv3_base
      OUT_FEATURES:
    • layer3
    • layer5
    • layer7
    • layer11
      POS_TYPE: abs
      WEIGHTS:
      OUTPUT_DIR:
      SCIHUB_DATA_DIR_TRAIN: ~/publaynet/layout_scihub/train
      SEED: 42
      SOLVER:
      AMP:
      ENABLED: true
      BACKBONE_MULTIPLIER: 1.0
      BASE_LR: 0.0002
      BIAS_LR_FACTOR: 1.0
      CHECKPOINT_PERIOD: 2000
      CLIP_GRADIENTS:
      CLIP_TYPE: full_model
      CLIP_VALUE: 1.0
      ENABLED: true
      NORM_TYPE: 2.0
      GAMMA: 0.1
      GRADIENT_ACCUMULATION_STEPS: 1
      IMS_PER_BATCH: 32
      LR_SCHEDULER_NAME: WarmupCosineLR
      MAX_ITER: 20000
      MOMENTUM: 0.9
      NESTEROV: false
      OPTIMIZER: ADAMW
      REFERENCE_WORLD_SIZE: 0
      STEPS:
  • 10000
    WARMUP_FACTOR: 0.01
    WARMUP_ITERS: 333
    WARMUP_METHOD: linear
    WEIGHT_DECAY: 0.05
    WEIGHT_DECAY_BIAS: null
    WEIGHT_DECAY_NORM: 0.0
    TEST:
    AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    • 400
    • 500
    • 600
    • 700
    • 800
    • 900
    • 1000
    • 1100
    • 1200
      DETECTIONS_PER_IMAGE: 100
      EVAL_PERIOD: 1000
      EXPECTED_RESULTS: []
      KEYPOINT_OKS_SIGMAS: []
      PRECISE_BN:
      ENABLED: false
      NUM_ITER: 200
      VERSION: 2
      VIS_PERIOD: 0

[11/20 23:25:53 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from C:\Users\yangt.cache\modelscope\hub\opendatalab\PDF-Extract-Kit-1___0/models\Layout/LayoutLMv3/model_final.pth ...
[11/20 23:25:53 fvcore.common.checkpoint]: [Checkpointer] Loading from c:\Users\yangt.cache\modelscope\hub\opendatalab\PDF-Extract-Kit-1___0/models\Layout/LayoutLMv3/model_final.pth ...
2024-11-20 23:25:58.307 | INFO | magic_pdf.model.pdf_extract_kit:init:137 - DocAnalysis init done!
2024-11-20 23:25:58.307 | INFO | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:131 - model init cost: 28.377976417541504
2024-11-20 23:26:01.428 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 2.27
2024-11-20 23:26:02.194 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.74
2024-11-20 23:26:02.198 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-11-20 23:26:02.370 | INFO | magic_pdf.model.sub_modules.model_utils:clean_vram:51 - gc time: 0.17
2024-11-20 23:26:03.631 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 1.26
2024-11-20 23:26:03.631 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 4.47-----
2024-11-20 23:26:04.894 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 1.26
2024-11-20 23:26:05.080 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.16
2024-11-20 23:26:05.817 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 4, mfr time: 0.74
2024-11-20 23:26:06.012 | INFO | magic_pdf.model.sub_modules.model_utils:clean_vram:51 - gc time: 0.19
2024-11-20 23:26:06.796 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.78
2024-11-20 23:26:06.796 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 3.16-----
2024-11-20 23:26:07.810 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 1.01
2024-11-20 23:26:07.968 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.14
2024-11-20 23:26:07.968 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-11-20 23:26:08.147 | INFO | magic_pdf.model.sub_modules.model_utils:clean_vram:51 - gc time: 0.18
2024-11-20 23:26:08.771 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.62
2024-11-20 23:26:08.771 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.97-----
2024-11-20 23:26:09.806 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 1.03
2024-11-20 23:26:09.961 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.13
2024-11-20 23:26:09.962 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-11-20 23:26:10.138 | INFO | magic_pdf.model.sub_modules.model_utils:clean_vram:51 - gc time: 0.18
2024-11-20 23:26:10.660 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.52
2024-11-20 23:26:10.661 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.89-----
2024-11-20 23:26:11.651 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 0.99
2024-11-20 23:26:11.804 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.13
2024-11-20 23:26:11.804 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-11-20 23:26:11.972 | INFO | magic_pdf.model.sub_modules.model_utils:clean_vram:51 - gc time: 0.17
2024-11-20 23:26:12.560 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.59
2024-11-20 23:26:12.561 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.9-----
2024-11-20 23:26:13.589 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 1.02
2024-11-20 23:26:13.741 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.13
2024-11-20 23:26:13.742 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-11-20 23:26:13.922 | INFO | magic_pdf.model.sub_modules.model_utils:clean_vram:51 - gc time: 0.18
2024-11-20 23:26:14.475 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.55
2024-11-20 23:26:14.475 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.91-----
2024-11-20 23:26:15.492 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 1.01
2024-11-20 23:26:15.648 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.13
2024-11-20 23:26:16.220 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 3, mfr time: 0.57
2024-11-20 23:26:16.391 | INFO | magic_pdf.model.sub_modules.model_utils:clean_vram:51 - gc time: 0.17
2024-11-20 23:26:16.930 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.54
2024-11-20 23:26:16.931 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 2.45-----
2024-11-20 23:26:17.969 | INFO | magic_pdf.model.pdf_extract_kit:call:153 - layout detection time: 1.03
2024-11-20 23:26:18.125 | INFO | magic_pdf.model.pdf_extract_kit:call:161 - mfd time: 0.13
2024-11-20 23:26:18.126 | INFO | magic_pdf.model.pdf_extract_kit:call:168 - formula nums: 0, mfr time: 0.0
2024-11-20 23:26:18.300 | INFO | magic_pdf.model.sub_modules.model_utils:clean_vram:51 - gc time: 0.17
2024-11-20 23:26:18.872 | INFO | magic_pdf.model.pdf_extract_kit:call:194 - ocr time: 0.57
2024-11-20 23:26:18.873 | INFO | magic_pdf.model.pdf_extract_kit:call:226 - -----page total time: 1.94-----
2024-11-20 23:26:19.025 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:176 - gc time: 0.15
2024-11-20 23:26:19.026 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:180 - doc analyze time: 19.86, speed: 0.4 pages/second
2024-11-20 23:26:19.187 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 0, last_page_cost_time: 0.0
2024-11-20 23:26:20.664 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 1, last_page_cost_time: 1.48
2024-11-20 23:26:20.705 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 2, last_page_cost_time: 0.04
2024-11-20 23:26:20.737 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 3, last_page_cost_time: 0.03
2024-11-20 23:26:20.764 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 4, last_page_cost_time: 0.03
2024-11-20 23:26:20.791 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 5, last_page_cost_time: 0.03
2024-11-20 23:26:20.818 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 6, last_page_cost_time: 0.03
2024-11-20 23:26:20.846 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:647 - page_id: 7, last_page_cost_time: 0.03
2024-11-20 23:26:21.226 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_markdown:62 - uni_pipe mk mm_markdown finished
2024-11-20 23:26:21.259 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_uni_format:57 - uni_pipe mk content list finished
2024-11-20 23:26:21.260 | INFO | magic_pdf.tools.common:do_parse:193 - local output dir is ./output\small_ocr\auto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants