Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kie训练自定义数据集,配置文件指定预训练模型不生效 #13627

Open
3 of 4 tasks
freezehe opened this issue Aug 9, 2024 · 12 comments
Open
3 of 4 tasks
Labels
bug Something isn't working

Comments

@freezehe
Copy link

freezehe commented Aug 9, 2024

Search before asking

  • I have searched the PaddleOCR Docs and found no similar bug report.

  • I have searched the PaddleOCR Issues and found no similar bug report.

  • I have searched the PaddleOCR Discussions and found no similar bug report.

Bug

如题:我是在百度studio进行训练,参考官方文档进行操作https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/kie.md,
我首先训练的是ser模型,配置内容如下:

Global:
  use_gpu: True
  epoch_num: &epoch_num 20
  log_smooth_window: 10
  print_batch_step: 10
  save_model_dir: ./output/ccic/ser_vi_layoutxlm_xfund_zh
  save_epoch_step: 2000
  # evaluation is run every 10 iterations after the 0th iteration
  eval_batch_step: [ 0, 19 ]
  cal_metric_during_train: False
  **pretrained_model: ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained**
  save_inference_dir:
  use_visualdl: False
  seed: 2022
  infer_img: ppstructure/docs/kie/input/zh_val_42.jpg
  d2s_train_image_shape: [3, 224, 224]
  # if you want to predict using the groundtruth ocr info,
  # you can use the following config
  # infer_img: train_data/XFUND/zh_val/val.json
  # infer_mode: False

  save_res_path: ./output/ccic/ser/xfund_zh/res
  kie_rec_model_dir: 
  kie_det_model_dir:
  amp_custom_white_list: ['scale', 'concat', 'elementwise_add']

Architecture:
  model_type: kie
  algorithm: &algorithm "LayoutXLM"
  Transform:
  Backbone:
    name: LayoutXLMForSer
    pretrained: True
    checkpoints:
    # one of base or vi
    mode: vi
    num_classes: &num_classes 7

Loss:
  name: VQASerTokenLayoutLMLoss
  num_classes: *num_classes
  key: "backbone_out"

Optimizer:
  name: AdamW
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Linear
    learning_rate: 0.00001
    epochs: *epoch_num
    warmup_epoch: 2
  regularizer:
    name: L2
    factor: 0.00000
    
PostProcess:
  name: VQASerTokenLayoutLMPostProcess
  class_path: &class_path train_data/XCCIC_8020/class_list_xfun.txt

Metric:
  name: VQASerTokenMetric
  main_indicator: hmean

Train:
  dataset:
    name: SimpleDataSet
    data_dir: train_data/XCCIC_8020/zh_train/image
    label_file_list: 
      - train_data/XCCIC_8020/zh_train/train.json
    ratio_list: [ 1.0 ]
    transforms:
      - DecodeImage: # load image
          img_mode: RGB
          channel_first: False
      - VQATokenLabelEncode: # Class handling label
          contains_re: False
          algorithm: *algorithm
          class_path: *class_path
          use_textline_bbox_info: &use_textline_bbox_info True
          # one of [None, "tb-yx"]
          order_method: &order_method "tb-yx"
      - VQATokenPad:
          max_seq_len: &max_seq_len 512
          return_attention_mask: True
      - VQASerTokenChunk:
          max_seq_len: *max_seq_len
      - Resize:
          size: [224,224]
      - NormalizeImage:
          scale: 1
          mean: [ 123.675, 116.28, 103.53 ]
          std: [ 58.395, 57.12, 57.375 ]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
  loader:
    shuffle: True
    drop_last: False
    batch_size_per_card: 8
    num_workers: 4

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: train_data/XCCIC_8020/zh_val/image
    label_file_list:
      - train_data/XCCIC_8020/zh_val/val.json
    transforms:
      - DecodeImage: # load image
          img_mode: RGB
          channel_first: False
      - VQATokenLabelEncode: # Class handling label
          contains_re: False
          algorithm: *algorithm
          class_path: *class_path
          use_textline_bbox_info: *use_textline_bbox_info
          order_method: *order_method
      - VQATokenPad:
          max_seq_len: *max_seq_len
          return_attention_mask: True
      - VQASerTokenChunk:
          max_seq_len: *max_seq_len
      - Resize:
          size: [224,224]
      - NormalizeImage:
          scale: 1
          mean: [ 123.675, 116.28, 103.53 ]
          std: [ 58.395, 57.12, 57.375 ]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 8
    num_workers: 4

pretrained_model: ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained 这一行配置是我新加的。当我执行训练命令:

%cd /home/aistudio/PaddleOCR
!python3 tools/train.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml

可以看到日志还是会默认下载模型并没有使用我配置的预训练模型,
image
我的需求是:我希望使用官网文档提供的预训练模型进行自定义数据的训练。

/home/aistudio/PaddleOCR
[2024/08/09 10:41:58] ppocr INFO: Architecture : 
[2024/08/09 10:41:58] ppocr INFO:     Backbone : 
[2024/08/09 10:41:58] ppocr INFO:         checkpoints : None
[2024/08/09 10:41:58] ppocr INFO:         mode : vi
[2024/08/09 10:41:58] ppocr INFO:         name : LayoutXLMForSer
[2024/08/09 10:41:58] ppocr INFO:         num_classes : 7
[2024/08/09 10:41:58] ppocr INFO:         pretrained : True
[2024/08/09 10:41:58] ppocr INFO:     Transform : None
[2024/08/09 10:41:58] ppocr INFO:     algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:     model_type : kie
[2024/08/09 10:41:58] ppocr INFO: Eval : 
[2024/08/09 10:41:58] ppocr INFO:     dataset : 
[2024/08/09 10:41:58] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_val/image
[2024/08/09 10:41:58] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_val/val.json']
[2024/08/09 10:41:58] ppocr INFO:         name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO:         transforms : 
[2024/08/09 10:41:58] ppocr INFO:             DecodeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 channel_first : False
[2024/08/09 10:41:58] ppocr INFO:                 img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/09 10:41:58] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:                 contains_re : False
[2024/08/09 10:41:58] ppocr INFO:                 order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO:             VQATokenPad : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:                 return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO:             VQASerTokenChunk : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:             Resize : 
[2024/08/09 10:41:58] ppocr INFO:                 size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO:             NormalizeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO:                 order : hwc
[2024/08/09 10:41:58] ppocr INFO:                 scale : 1
[2024/08/09 10:41:58] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO:             ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO:             KeepKeys : 
[2024/08/09 10:41:58] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO:     loader : 
[2024/08/09 10:41:58] ppocr INFO:         batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO:         drop_last : False
[2024/08/09 10:41:58] ppocr INFO:         num_workers : 4
[2024/08/09 10:41:58] ppocr INFO:         shuffle : False
[2024/08/09 10:41:58] ppocr INFO: Global : 
[2024/08/09 10:41:58] ppocr INFO:     amp_custom_white_list : ['scale', 'concat', 'elementwise_add']
[2024/08/09 10:41:58] ppocr INFO:     cal_metric_during_train : False
[2024/08/09 10:41:58] ppocr INFO:     d2s_train_image_shape : [3, 224, 224]
[2024/08/09 10:41:58] ppocr INFO:     distributed : False
[2024/08/09 10:41:58] ppocr INFO:     epoch_num : 20
[2024/08/09 10:41:58] ppocr INFO:     eval_batch_step : [0, 19]
[2024/08/09 10:41:58] ppocr INFO:     infer_img : ppstructure/docs/kie/input/zh_val_42.jpg
[2024/08/09 10:41:58] ppocr INFO:     kie_det_model_dir : None
[2024/08/09 10:41:58] ppocr INFO:     kie_rec_model_dir : None
[2024/08/09 10:41:58] ppocr INFO:     log_smooth_window : 10
[2024/08/09 10:41:58] ppocr INFO:     pretrained_model : ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained
[2024/08/09 10:41:58] ppocr INFO:     print_batch_step : 10
[2024/08/09 10:41:58] ppocr INFO:     save_epoch_step : 2000
[2024/08/09 10:41:58] ppocr INFO:     save_inference_dir : None
[2024/08/09 10:41:58] ppocr INFO:     save_model_dir : ./output/ccic/ser_vi_layoutxlm_xfund_zh
[2024/08/09 10:41:58] ppocr INFO:     save_res_path : ./output/ccic/ser/xfund_zh/res
[2024/08/09 10:41:58] ppocr INFO:     seed : 2022
[2024/08/09 10:41:58] ppocr INFO:     use_gpu : True
[2024/08/09 10:41:58] ppocr INFO:     use_visualdl : False
[2024/08/09 10:41:58] ppocr INFO: Loss : 
[2024/08/09 10:41:58] ppocr INFO:     key : backbone_out
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenLayoutLMLoss
[2024/08/09 10:41:58] ppocr INFO:     num_classes : 7
[2024/08/09 10:41:58] ppocr INFO: Metric : 
[2024/08/09 10:41:58] ppocr INFO:     main_indicator : hmean
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenMetric
[2024/08/09 10:41:58] ppocr INFO: Optimizer : 
[2024/08/09 10:41:58] ppocr INFO:     beta1 : 0.9
[2024/08/09 10:41:58] ppocr INFO:     beta2 : 0.999
[2024/08/09 10:41:58] ppocr INFO:     lr : 
[2024/08/09 10:41:58] ppocr INFO:         epochs : 20
[2024/08/09 10:41:58] ppocr INFO:         learning_rate : 1e-05
[2024/08/09 10:41:58] ppocr INFO:         name : Linear
[2024/08/09 10:41:58] ppocr INFO:         warmup_epoch : 2
[2024/08/09 10:41:58] ppocr INFO:     name : AdamW
[2024/08/09 10:41:58] ppocr INFO:     regularizer : 
[2024/08/09 10:41:58] ppocr INFO:         factor : 0.0
[2024/08/09 10:41:58] ppocr INFO:         name : L2
[2024/08/09 10:41:58] ppocr INFO: PostProcess : 
[2024/08/09 10:41:58] ppocr INFO:     class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenLayoutLMPostProcess
[2024/08/09 10:41:58] ppocr INFO: Train : 
[2024/08/09 10:41:58] ppocr INFO:     dataset : 
[2024/08/09 10:41:58] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_train/image
[2024/08/09 10:41:58] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_train/train.json']
[2024/08/09 10:41:58] ppocr INFO:         name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO:         ratio_list : [1.0]
[2024/08/09 10:41:58] ppocr INFO:         transforms : 
[2024/08/09 10:41:58] ppocr INFO:             DecodeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 channel_first : False
[2024/08/09 10:41:58] ppocr INFO:                 img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/09 10:41:58] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:                 contains_re : False
[2024/08/09 10:41:58] ppocr INFO:                 order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO:             VQATokenPad : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:                 return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO:             VQASerTokenChunk : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:             Resize : 
[2024/08/09 10:41:58] ppocr INFO:                 size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO:             NormalizeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO:                 order : hwc
[2024/08/09 10:41:58] ppocr INFO:                 scale : 1
[2024/08/09 10:41:58] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO:             ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO:             KeepKeys : 
[2024/08/09 10:41:58] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO:     loader : 
[2024/08/09 10:41:58] ppocr INFO:         batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO:         drop_last : False
[2024/08/09 10:41:58] ppocr INFO:         num_workers : 4
[2024/08/09 10:41:58] ppocr INFO:         shuffle : True
[2024/08/09 10:41:58] ppocr INFO: profiler_options : None
[2024/08/09 10:41:58] ppocr INFO: train with paddle 2.5.2 and device Place(gpu:0)
[2024/08/09 10:41:58] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_train/train.json']
list index out of range
[2024-08-09 10:41:59,583] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
[2024-08-09 10:41:59,640] [    INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 5.25MB/s]
[2024-08-09 10:42:01,488] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:01,488] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024/08/09 10:42:01] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_val/val.json']
[2024-08-09 10:42:01,490] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-08-09 10:42:02,249] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:02,249] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024-08-09 10:42:02,252] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams and saved to /home/aistudio/.paddlenlp/models/vi-layoutxlm-base-uncased
[2024-08-09 10:42:02,252] [    INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams
100%|██████████████████████████████████████| 1.04G/1.04G [00:13<00:00, 80.3MB/s]
W0809 10:42:16.289948 80856 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0809 10:42:16.291229 80856 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
[2024-08-09 10:42:19,987] [    INFO] - Weights of LayoutXLMForTokenClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
[2024/08/09 10:42:20] ppocr INFO: train dataloader has 18 iters
[2024/08/09 10:42:20] ppocr INFO: valid dataloader has 5 iters

Environment

百度studio
aiofiles==23.2.1
aiohttp==3.9.5
aiosignal==1.3.1
aistudio-sdk @ file:///home/aistudio/aistudio_sdk-0.2.4-py3-none-any.whl#sha256=d93411cc8764e465860cbf2f97f787dddd1548595d4776c97ddf0ea787dedd81
albucore==0.0.13
albumentations==1.4.10
altair==4.2.2
annotated-types==0.6.0
anyio==4.3.0
astor==0.8.1
asttokens==2.4.1
async-timeout==4.0.3
attrdict3==2.0.2
attrs==23.2.0
Babel==2.14.0
bce-python-sdk==0.9.6
beautifulsoup4==4.12.3
blinker==1.7.0
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
coloredlogs==15.0.1
colorlog==6.8.2
comm==0.2.2
contourpy==1.2.1
cycler==0.12.1
Cython==3.0.11
datasets==2.19.0
debugpy==1.8.1
decorator==5.1.1
dill==0.3.4
easydict==1.13
entrypoints==0.4
exceptiongroup==1.2.1
executing==2.0.1
fastapi==0.110.2
ffmpy==0.3.2
filelock==3.13.4
fire==0.6.0
Flask==3.0.3
Flask-Babel==2.0.0
flatbuffers==24.3.25
fonttools==4.51.0
frozenlist==1.4.1
fsspec==2024.3.1
future==1.0.0
gitdb==4.0.11
GitPython==3.1.43
gradio==3.40.0
gradio_client==0.15.1
gunicorn==22.0.0
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.22.2
humanfriendly==10.0
idna==3.7
imageio==2.34.2
imgaug==0.4.0
importlib_metadata==7.1.0
importlib_resources==6.4.0
ipykernel==6.29.4
ipython==8.23.0
itsdangerous==2.2.0
jedi==0.19.1
jieba==0.42.1
Jinja2==3.1.3
joblib==1.4.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter_client==8.6.1
jupyter_core==5.7.2
kiwisolver==1.4.5
lazy_loader==0.4
linkify-it-py==2.0.3
lmdb==1.5.1
lxml==5.2.2
markdown-it-py==2.2.0
MarkupSafe==2.1.5
matplotlib==3.8.4
matplotlib-inline==0.1.7
mdit-py-plugins==0.3.3
mdurl==0.1.1
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.12.2
nest-asyncio==1.6.0
networkx==3.3
numpy==1.26.4
onnx==1.16.0
onnxruntime==1.17.3
opencv-contrib-python==4.10.0.84
opencv-python==4.9.0.80
opencv-python-headless==4.10.0.84
opt-einsum==3.3.0
orjson==3.10.1
packaging==24.0
paddle2onnx==1.2.1
paddlefsl==1.1.0
paddlehub==2.4.0
paddlenlp==2.5.2
paddleocr==2.8.1
paddlepaddle-gpu @ file:///tmp/paddlepaddle_gpu-2.5.2-cp310-cp310-linux_x86_64.whl#sha256=2b4a84c853c7c88ddf4984c667bfcb824cc8a28a674448099452f50c686cc1bb
pandas==2.2.2
parso==0.8.4
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.3.0
platformdirs==4.2.0
prettytable==3.10.0
prompt-toolkit==3.0.43
protobuf==3.20.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==16.0.0
pyarrow-hotfix==0.6
pybind11==2.12.0
pyclipper==1.3.0.post5
pycryptodome==3.20.0
pydantic==2.7.0
pydantic_core==2.18.1
pydeck==0.9.1
pydub==0.25.1
Pygments==2.17.2
Pympler==1.0.1
pypandoc==1.13
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-docx==1.1.2
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.2
rapidfuzz==3.9.6
rarfile==4.2
referencing==0.34.0
requests==2.31.0
rich==13.7.1
rpds-py==0.18.0
ruff==0.4.1
safetensors==0.4.3
scikit-image==0.24.0
scikit-learn==1.4.2
scipy==1.13.0
semantic-version==2.10.0
semver==3.0.2
sentencepiece==0.2.0
seqeval==1.2.2
shapely==2.0.5
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
soupsieve==2.5
stack-data==0.6.3
starlette==0.37.2
streamlit==1.13.0
streamlit-image-comparison==0.0.4
sympy==1.12
termcolor==2.4.0
threadpoolctl==3.4.0
tifffile==2024.7.24
toml==0.10.2
tomli==2.0.1
tomlkit==0.12.0
tool-helpers==0.1.1
toolz==0.12.1
tornado==6.4
tqdm==4.66.2
traitlets==5.14.3
typer==0.12.3
typing_extensions==4.11.0
tzdata==2024.1
tzlocal==5.2
uc-micro-py==1.0.3
urllib3==2.2.1
uvicorn==0.29.0
validators==0.28.3
visualdl==2.4.2
watchdog==4.0.1
wcwidth==0.2.13
websockets==11.0.3
Werkzeug==3.0.2
xxhash==3.4.1
yacs==0.1.8
yarl==1.9.4
zipp==3.19.2

Minimal Reproducible Example

/home/aistudio/PaddleOCR
[2024/08/09 10:41:58] ppocr INFO: Architecture : 
[2024/08/09 10:41:58] ppocr INFO:     Backbone : 
[2024/08/09 10:41:58] ppocr INFO:         checkpoints : None
[2024/08/09 10:41:58] ppocr INFO:         mode : vi
[2024/08/09 10:41:58] ppocr INFO:         name : LayoutXLMForSer
[2024/08/09 10:41:58] ppocr INFO:         num_classes : 7
[2024/08/09 10:41:58] ppocr INFO:         pretrained : True
[2024/08/09 10:41:58] ppocr INFO:     Transform : None
[2024/08/09 10:41:58] ppocr INFO:     algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:     model_type : kie
[2024/08/09 10:41:58] ppocr INFO: Eval : 
[2024/08/09 10:41:58] ppocr INFO:     dataset : 
[2024/08/09 10:41:58] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_val/image
[2024/08/09 10:41:58] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_val/val.json']
[2024/08/09 10:41:58] ppocr INFO:         name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO:         transforms : 
[2024/08/09 10:41:58] ppocr INFO:             DecodeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 channel_first : False
[2024/08/09 10:41:58] ppocr INFO:                 img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/09 10:41:58] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:                 contains_re : False
[2024/08/09 10:41:58] ppocr INFO:                 order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO:             VQATokenPad : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:                 return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO:             VQASerTokenChunk : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:             Resize : 
[2024/08/09 10:41:58] ppocr INFO:                 size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO:             NormalizeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO:                 order : hwc
[2024/08/09 10:41:58] ppocr INFO:                 scale : 1
[2024/08/09 10:41:58] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO:             ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO:             KeepKeys : 
[2024/08/09 10:41:58] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO:     loader : 
[2024/08/09 10:41:58] ppocr INFO:         batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO:         drop_last : False
[2024/08/09 10:41:58] ppocr INFO:         num_workers : 4
[2024/08/09 10:41:58] ppocr INFO:         shuffle : False
[2024/08/09 10:41:58] ppocr INFO: Global : 
[2024/08/09 10:41:58] ppocr INFO:     amp_custom_white_list : ['scale', 'concat', 'elementwise_add']
[2024/08/09 10:41:58] ppocr INFO:     cal_metric_during_train : False
[2024/08/09 10:41:58] ppocr INFO:     d2s_train_image_shape : [3, 224, 224]
[2024/08/09 10:41:58] ppocr INFO:     distributed : False
[2024/08/09 10:41:58] ppocr INFO:     epoch_num : 20
[2024/08/09 10:41:58] ppocr INFO:     eval_batch_step : [0, 19]
[2024/08/09 10:41:58] ppocr INFO:     infer_img : ppstructure/docs/kie/input/zh_val_42.jpg
[2024/08/09 10:41:58] ppocr INFO:     kie_det_model_dir : None
[2024/08/09 10:41:58] ppocr INFO:     kie_rec_model_dir : None
[2024/08/09 10:41:58] ppocr INFO:     log_smooth_window : 10
[2024/08/09 10:41:58] ppocr INFO:     pretrained_model : ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained
[2024/08/09 10:41:58] ppocr INFO:     print_batch_step : 10
[2024/08/09 10:41:58] ppocr INFO:     save_epoch_step : 2000
[2024/08/09 10:41:58] ppocr INFO:     save_inference_dir : None
[2024/08/09 10:41:58] ppocr INFO:     save_model_dir : ./output/ccic/ser_vi_layoutxlm_xfund_zh
[2024/08/09 10:41:58] ppocr INFO:     save_res_path : ./output/ccic/ser/xfund_zh/res
[2024/08/09 10:41:58] ppocr INFO:     seed : 2022
[2024/08/09 10:41:58] ppocr INFO:     use_gpu : True
[2024/08/09 10:41:58] ppocr INFO:     use_visualdl : False
[2024/08/09 10:41:58] ppocr INFO: Loss : 
[2024/08/09 10:41:58] ppocr INFO:     key : backbone_out
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenLayoutLMLoss
[2024/08/09 10:41:58] ppocr INFO:     num_classes : 7
[2024/08/09 10:41:58] ppocr INFO: Metric : 
[2024/08/09 10:41:58] ppocr INFO:     main_indicator : hmean
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenMetric
[2024/08/09 10:41:58] ppocr INFO: Optimizer : 
[2024/08/09 10:41:58] ppocr INFO:     beta1 : 0.9
[2024/08/09 10:41:58] ppocr INFO:     beta2 : 0.999
[2024/08/09 10:41:58] ppocr INFO:     lr : 
[2024/08/09 10:41:58] ppocr INFO:         epochs : 20
[2024/08/09 10:41:58] ppocr INFO:         learning_rate : 1e-05
[2024/08/09 10:41:58] ppocr INFO:         name : Linear
[2024/08/09 10:41:58] ppocr INFO:         warmup_epoch : 2
[2024/08/09 10:41:58] ppocr INFO:     name : AdamW
[2024/08/09 10:41:58] ppocr INFO:     regularizer : 
[2024/08/09 10:41:58] ppocr INFO:         factor : 0.0
[2024/08/09 10:41:58] ppocr INFO:         name : L2
[2024/08/09 10:41:58] ppocr INFO: PostProcess : 
[2024/08/09 10:41:58] ppocr INFO:     class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenLayoutLMPostProcess
[2024/08/09 10:41:58] ppocr INFO: Train : 
[2024/08/09 10:41:58] ppocr INFO:     dataset : 
[2024/08/09 10:41:58] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_train/image
[2024/08/09 10:41:58] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_train/train.json']
[2024/08/09 10:41:58] ppocr INFO:         name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO:         ratio_list : [1.0]
[2024/08/09 10:41:58] ppocr INFO:         transforms : 
[2024/08/09 10:41:58] ppocr INFO:             DecodeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 channel_first : False
[2024/08/09 10:41:58] ppocr INFO:                 img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/09 10:41:58] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:                 contains_re : False
[2024/08/09 10:41:58] ppocr INFO:                 order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO:             VQATokenPad : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:                 return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO:             VQASerTokenChunk : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:             Resize : 
[2024/08/09 10:41:58] ppocr INFO:                 size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO:             NormalizeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO:                 order : hwc
[2024/08/09 10:41:58] ppocr INFO:                 scale : 1
[2024/08/09 10:41:58] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO:             ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO:             KeepKeys : 
[2024/08/09 10:41:58] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO:     loader : 
[2024/08/09 10:41:58] ppocr INFO:         batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO:         drop_last : False
[2024/08/09 10:41:58] ppocr INFO:         num_workers : 4
[2024/08/09 10:41:58] ppocr INFO:         shuffle : True
[2024/08/09 10:41:58] ppocr INFO: profiler_options : None
[2024/08/09 10:41:58] ppocr INFO: train with paddle 2.5.2 and device Place(gpu:0)
[2024/08/09 10:41:58] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_train/train.json']
list index out of range
[2024-08-09 10:41:59,583] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
[2024-08-09 10:41:59,640] [    INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 5.25MB/s]
[2024-08-09 10:42:01,488] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:01,488] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024/08/09 10:42:01] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_val/val.json']
[2024-08-09 10:42:01,490] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-08-09 10:42:02,249] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:02,249] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024-08-09 10:42:02,252] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams and saved to /home/aistudio/.paddlenlp/models/vi-layoutxlm-base-uncased
[2024-08-09 10:42:02,252] [    INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams
100%|██████████████████████████████████████| 1.04G/1.04G [00:13<00:00, 80.3MB/s]
W0809 10:42:16.289948 80856 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0809 10:42:16.291229 80856 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
[2024-08-09 10:42:19,987] [    INFO] - Weights of LayoutXLMForTokenClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
[2024/08/09 10:42:20] ppocr INFO: train dataloader has 18 iters
[2024/08/09 10:42:20] ppocr INFO: valid dataloader has 5 iters

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@freezehe freezehe added the bug Something isn't working label Aug 9, 2024
@kingleft
Copy link

和你一样的情况,有解决方案了么?

@zyk0901
Copy link

zyk0901 commented Sep 3, 2024

怎么解决呢?

@metoogo
Copy link

metoogo commented Oct 26, 2024

你的pretrained model下载好以后是怎么样的?我下载的tar文件解压后,还是一个没有后缀的压缩文件,改了后缀后再解压得到3个文件,但是缺.pdopt文件,所以还是不能当预训练模型用

Copy link
Contributor

This issue is stale because it has been open for 90 days with no activity.

@github-actions github-actions bot added the stale label Jan 25, 2025
@callme13yeye
Copy link

我是用cpu训练的ser+re,导出为inference模型后,用官方的predict_kie_token_ser_re.py脚本去推理的时候,就会报这个错误:ValueError: (InvalidArgument) The dims of Input(X) should be greater than 0
[Hint: Expected x_dims[i] > 0, but received x_dims[i]:0 <= 0:0.] (at ..\paddle\phi\kernels\cpu\stack_kernel.cc:34)
但是如果我只推理ser也就是用官方的predict_kie_token_ser.py脚本就不会有任何问题,感觉re就是半成品

@Lucknicking
Copy link

我是用cpu训练的ser+re,导出为inference模型后,用官方的predict_kie_token_ser_re.py脚本去推理的时候,就会报这个错误:ValueError: (InvalidArgument) The dims of Input(X) should be greater than 0 [Hint: Expected x_dims[i] > 0, but received x_dims[i]:0 <= 0:0.] (at ..\paddle\phi\kernels\cpu\stack_kernel.cc:34) 但是如果我只推理ser也就是用官方的predict_kie_token_ser.py脚本就不会有任何问题,感觉re就是半成品

我用的GPU训练出来了re模型,但是还没有导出为inference模型,但是我用训练好的re模型推理的时候训练结果很差,请问可以交流一下吗

@callme13yeye
Copy link

callme13yeye commented Feb 10, 2025 via email

@Lucknicking
Copy link

不好意思哈,我目前还在尝试Issues其他人提供的解决思路,看推理的时候,还会不会报我之前说的那个错误,因为我没有推理成功,所以我也不知道我的预测结果的好坏,但是你如果没有报错了,仅仅是预测结果不太理想的话,你可以优先看看同类数据集是否>50张以上,并且改善一下学习率等等

冰封
@.***

 

你可以先不用导出为inference,可以直接使用如下命令来推理,
python3 ./tools/infer_kie_token_ser_re.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_hukoubu_zh.yml -o Architecture.Backbone.checkpoints=/root/project/PaddleOCR/output/re_vi_layoutxlm_xfund_zh/latest Global.infer_img=./images/hukoubu_huangxinghua.jpg -c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_hukoubu_zh.yml -o_ser Architecture.Backbone.checkpoints=/root/project/PaddleOCR/output/ser_vi_layoutxlm_hukoubu_zh/best_accuracy

@callme13yeye
Copy link

callme13yeye commented Feb 10, 2025 via email

@callme13yeye
Copy link

不好意思哈,我目前还在尝试Issues其他人提供的解决思路,看推理的时候,还会不会报我之前说的那个错误,因为我没有推理成功,所以我也不知道我的预测结果的好坏,但是你如果没有报错了,仅仅是预测结果不太理想的话,你可以优先看看同类数据集是否>50张以上,并且改善一下学习率等等
冰封
@.***
 

你可以先不用导出为inference,可以直接使用如下命令来推理, python3 ./tools/infer_kie_token_ser_re.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_hukoubu_zh.yml -o Architecture.Backbone.checkpoints=/root/project/PaddleOCR/output/re_vi_layoutxlm_xfund_zh/latest Global.infer_img=./images/hukoubu_huangxinghua.jpg -c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_hukoubu_zh.yml -o_ser Architecture.Backbone.checkpoints=/root/project/PaddleOCR/output/ser_vi_layoutxlm_hukoubu_zh/best_accuracy

经过昨天我重新训练了det模型后,然后ser+re都加入det+rec模型去训练, 我发现re的学习率非常非常低,训练完我就去推理了一下,惨不忍睹,有可能需要修改优化器和学习率,也有可能必须得放弃ser+re,改用向量大模型语义匹配分析,再结合提示词工程,让大模型来结构化

@laoding1974
Copy link

我是用cpu训练的ser+re,导出为inference模型后,用官方的predict_kie_token_ser_re.py脚本去推理的时候,就会报这个错误:ValueError: (InvalidArgument) The dims of Input(X) should be greater than 0 [Hint: Expected x_dims[i] > 0, but received x_dims[i]:0 <= 0:0.] (at ..\paddle\phi\kernels\cpu\stack_kernel.cc:34) 但是如果我只推理ser也就是用官方的predict_kie_token_ser.py脚本就不会有任何问题,感觉re就是半成品

ser+re如何配置的训练参数的,可以分享一下吗

@callme13yeye
Copy link

我是用cpu训练的ser+re,导出为inference模型后,用官方的predict_kie_token_ser_re.py脚本去推理的时候,就会报这个错误:ValueError: (InvalidArgument) The dims of Input(X) should be greater than 0 [Hint: Expected x_dims[i] > 0, but received x_dims[i]:0 <= 0:0.] (at ..\paddle\phi\kernels\cpu\stack_kernel.cc:34) 但是如果我只推理ser也就是用官方的predict_kie_token_ser.py脚本就不会有任何问题,感觉re就是半成品

ser+re如何配置的训练参数的,可以分享一下吗

参数可变的也就配置文件中的Optimizer,以及Global下的epoch_num,这些参数会影响模型训练,目前我还在调整参数不停的训练,看那种会更好,我贴一个,我目前训练的参数,但是也不能保证效果好不好
参数如下:
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
clip_norm: 10
lr:
learning_rate: 0.001 # 0.00005
warmup_epoch: 10
regularizer:
name: L2
factor: 5.0e-05 # 0.00000

@github-actions github-actions bot removed the stale label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants