Skip to content

训练ppocrv4 报错ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. #11200

@wangyang581

Description

@wangyang581

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:Ubuntu 20.04 环境为paddle官方docker
  • 版本号/Version:paddlepaddle/paddle:2.5.2-gpu-cuda11.2-cudnn8.2-trt8.0
  • Paddle:paddlepaddle-gpu:2.5.2.post112
  • PaddleOCR:release/2.7
  • 问题相关组件/Related components:tools/train.py
  • 运行指令/Command Code:python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_distill.yml
  • 完整报错/Complete Error Message:
    Traceback (most recent call last):
    File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "main", mod_spec)
    File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
    cli.main()
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="main")
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
    File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
    File "/paddle/tools/train.py", line 227, in
    main(config, device, logger, vdl_writer)
    File "/paddle/tools/train.py", line 202, in main
    amp_dtype)
    File "/paddle/tools/program.py", line 301, in train
    preds = model(images, data=batch[1:])
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/architectures/distillation_model.py", line 59, in forward
    result_dict[model_name] = self.model_list[idx](x, data)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/architectures/base_model.py", line 100, in forward
    x = self.head(x, targets=data)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/heads/rec_multi_head.py", line 92, in forward
    ctc_encoder = self.ctc_encoder(x)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/necks/rnn.py", line 261, in forward
    x = self.encoder(x)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/necks/rnn.py", line 208, in forward
    z = self.conv1(z)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/paddle/ppocr/modeling/backbones/rec_svtrnet.py", line 68, in forward
    out = self.conv(inputs)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in call
    return self.forward(*inputs, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/conv.py", line 722, in forward
    use_cudnn=self._use_cudnn,
    File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/conv.py", line 141, in _conv_nd
    data_format,
    ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [8, 240, 256].
    [Hint: Expected in_dims.size() == 4 || in_dims.size() == 5 == true, but received in_dims.size() == 4 || in_dims.size() == 5:0 != true:1.] (at ../paddle/phi/infermeta/binary.cc:475)

我镜像中用相同的数据可以用ch_PP-OCRv4_rec_hgnet.yml配置文件训练,也可以用v3的配置文件训练,只有ch_PP-OCRv4_rec_distill.yml这个配置文件报错。

我采用的ch_PP-OCRv4_rec_distill.yml配置文件的内容如下:

Global:
debug: false
use_gpu: true
epoch_num: 200
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_dkd_400w_svtr_ctc_lcnet_blank_dkd0.1/
save_epoch_step: 40
eval_batch_step:

  • 0
  • 2000
    cal_metric_during_train: true
    pretrained_model: ./pre_train/rec/ch_PP-OCRv4_rec_train/student.pdparams
    checkpoints:
    save_inference_dir: doc/imgs_words/ch/
    use_visualdl: false
    infer_img: doc/imgs_words/ch/word_1.jpg
    character_dict_path: ppocr/utils/ppocr_keys_v1.txt
    max_text_length: &max_text_length 25
    infer_mode: false
    use_space_char: true
    distributed: true
    save_res_path: ./output/rec/predicts_ppocrv3.txt
    Optimizer:
    name: Adam
    beta1: 0.9
    beta2: 0.999
    lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 2
    regularizer:
    name: L2
    factor: 3.0e-05
    Architecture:
    model_type: rec
    name: DistillationModel
    algorithm: Distillation
    Models:
    Teacher:
    pretrained:
    freeze_params: true
    return_all_feats: true
    model_type: rec
    algorithm: SVTR
    Transform: null
    Backbone:
    name: SVTRNet
    img_size:
    - 48
    - 320
    out_char_num: 40
    out_channels: 192
    patch_merging: Conv
    embed_dim:
    - 64
    - 128
    - 256
    depth:
    - 3
    - 6
    - 3
    num_heads:
    - 2
    - 4
    - 8
    mixer:
    - Conv
    - Conv
    - Conv
    - Conv
    - Conv
    - Conv
    - Global
    - Global
    - Global
    - Global
    - Global
    - Global
    local_mixer:
    - - 5
    - 5
    - - 5
    - 5
    - - 5
    - 5
    last_stage: false
    prenorm: true
    Head:
    name: MultiHead
    head_list:
    - CTCHead:
    Neck:
    name: svtr
    dims: 120
    depth: 2
    hidden_dims: 120
    kernel_size: [1, 3]
    use_guide: True
    Head:
    fc_decay: 0.00001
    - NRTRHead:
    nrtr_dim: 384
    max_text_length: *max_text_length
    Student:
    pretrained:
    freeze_params: false
    return_all_feats: true
    model_type: rec
    algorithm: SVTR
    Transform: null
    Backbone:
    name: PPLCNetV3
    scale: 0.95
    Head:
    name: MultiHead
    head_list:
    - CTCHead:
    Neck:
    name: svtr
    dims: 120
    depth: 2
    hidden_dims: 120
    kernel_size: [1, 3]
    use_guide: True
    Head:
    fc_decay: 0.00001
    - NRTRHead:
    nrtr_dim: 384
    max_text_length: *max_text_length
    Loss:
    name: CombinedLoss
    loss_config_list:
  • DistillationDKDLoss:
    weight: 0.1
    model_name_pairs:
      • Student
      • Teacher
        key: head_out
        multi_head: true
        alpha: 1.0
        beta: 2.0
        dis_head: gtc
        name: dkd
  • DistillationCTCLoss:
    weight: 1.0
    model_name_list:
    • Student
      key: head_out
      multi_head: true
  • DistillationNRTRLoss:
    weight: 1.0
    smoothing: false
    model_name_list:
    • Student
      key: head_out
      multi_head: true
  • DistillCTCLogits:
    weight: 1.0
    reduction: mean
    model_name_pairs:
      • Student
      • Teacher
        key: head_out
        PostProcess:
        name: DistillationCTCLabelDecode
        model_name:
  • Student
    key: head_out
    multi_head: true
    Metric:
    name: DistillationMetric
    base_metric_name: RecMetric
    main_indicator: acc
    key: Student
    ignore_space: false
    Train:
    dataset:
    name: SimpleDataSet
    data_dir: ./train_data/lpd_rec
    label_file_list:
    • ./train_data/lpd_rec/train.txt
      ratio_list:

    • 1.0
      transforms:

    • DecodeImage:
      img_mode: BGR
      channel_first: false

    • RecAug:

    • MultiLabelEncode:
      gtc_encode: NRTRLabelEncode

    • RecResizeImg:
      image_shape: [3, 48, 320]

    • KeepKeys:
      keep_keys:

      • image
      • label_ctc
      • label_gtc
      • length
      • valid_ratio
        loader:
        shuffle: true
        batch_size_per_card: 8
        drop_last: true
        num_workers: 2
        use_shared_memory: true
        Eval:
        dataset:
        name: SimpleDataSet
        data_dir: ./train_data/lpd_rec
        label_file_list:
    • ./train_data/lpd_rec/test.txt
      transforms:

    • DecodeImage:
      img_mode: BGR
      channel_first: false

    • MultiLabelEncode:
      gtc_encode: NRTRLabelEncode

    • RecResizeImg:
      image_shape: [3, 48, 320]

    • KeepKeys:
      keep_keys:

      • image
      • label_ctc
      • label_gtc
      • length
      • valid_ratio
        loader:
        shuffle: false
        drop_last: false
        batch_size_per_card: 8
        num_workers: 2
        profiler_options: null

请问我要如何修改呢?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions