训练ppocrv4 报错ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor.

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

- 系统环境/System Environment：Ubuntu 20.04        环境为paddle官方docker
- 版本号/Version：paddlepaddle/paddle:2.5.2-gpu-cuda11.2-cudnn8.2-trt8.0  
- Paddle：paddlepaddle-gpu：2.5.2.post112   
- PaddleOCR：release/2.7 
-  问题相关组件/Related components：tools/train.py
- 运行指令/Command Code：python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_distill.yml
- 
- 完整报错/Complete Error Message：
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/root/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/paddle/tools/train.py", line 227, in <module>
    main(config, device, logger, vdl_writer)
  File "/paddle/tools/train.py", line 202, in main
    amp_dtype)
  File "/paddle/tools/program.py", line 301, in train
    preds = model(images, data=batch[1:])
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/paddle/ppocr/modeling/architectures/distillation_model.py", line 59, in forward
    result_dict[model_name] = self.model_list[idx](x, data)
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/paddle/ppocr/modeling/architectures/base_model.py", line 100, in forward
    x = self.head(x, targets=data)
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/paddle/ppocr/modeling/heads/rec_multi_head.py", line 92, in forward
    ctc_encoder = self.ctc_encoder(x)
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/paddle/ppocr/modeling/necks/rnn.py", line 261, in forward
    x = self.encoder(x)
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/paddle/ppocr/modeling/necks/rnn.py", line 208, in forward
    z = self.conv1(z)
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/paddle/ppocr/modeling/backbones/rec_svtrnet.py", line 68, in forward
    out = self.conv(inputs)
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/layer/conv.py", line 722, in forward
    use_cudnn=self._use_cudnn,
  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/conv.py", line 141, in _conv_nd
    data_format,
ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [8, 240, 256].
  [Hint: Expected in_dims.size() == 4 || in_dims.size() == 5 == true, but received in_dims.size() == 4 || in_dims.size() == 5:0 != true:1.] (at ../paddle/phi/infermeta/binary.cc:475)

# 我镜像中用相同的数据可以用ch_PP-OCRv4_rec_hgnet.yml配置文件训练，也可以用v3的配置文件训练，只有ch_PP-OCRv4_rec_distill.yml这个配置文件报错。
# 我采用的ch_PP-OCRv4_rec_distill.yml配置文件的内容如下：
Global:
  debug: false
  use_gpu: true
  epoch_num: 200
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/rec_dkd_400w_svtr_ctc_lcnet_blank_dkd0.1/
  save_epoch_step: 40
  eval_batch_step:
  - 0
  - 2000
  cal_metric_during_train: true
  pretrained_model: ./pre_train/rec/ch_PP-OCRv4_rec_train/student.pdparams
  checkpoints: 
  save_inference_dir: doc/imgs_words/ch/
  use_visualdl: false
  infer_img: doc/imgs_words/ch/word_1.jpg
  character_dict_path: ppocr/utils/ppocr_keys_v1.txt
  max_text_length: &max_text_length 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 2
  regularizer:
    name: L2
    factor: 3.0e-05
Architecture:
  model_type: rec
  name: DistillationModel
  algorithm: Distillation
  Models:
    Teacher:
      pretrained: 
      freeze_params: true
      return_all_feats: true
      model_type: rec
      algorithm: SVTR
      Transform: null
      Backbone:
        name: SVTRNet
        img_size:
        - 48
        - 320
        out_char_num: 40
        out_channels: 192
        patch_merging: Conv
        embed_dim:
        - 64
        - 128
        - 256
        depth:
        - 3
        - 6
        - 3
        num_heads:
        - 2
        - 4
        - 8
        mixer:
        - Conv
        - Conv
        - Conv
        - Conv
        - Conv
        - Conv
        - Global
        - Global
        - Global
        - Global
        - Global
        - Global
        local_mixer:
        - - 5
          - 5
        - - 5
          - 5
        - - 5
          - 5
        last_stage: false
        prenorm: true
      Head:
        name: MultiHead
        head_list:
          - CTCHead:
              Neck:
                name: svtr
                dims: 120
                depth: 2
                hidden_dims: 120
                kernel_size: [1, 3]
                use_guide: True
              Head:
                fc_decay: 0.00001
          - NRTRHead:
              nrtr_dim: 384
              max_text_length: *max_text_length
    Student:
      pretrained: 
      freeze_params: false
      return_all_feats: true
      model_type: rec
      algorithm: SVTR
      Transform: null
      Backbone:
        name: PPLCNetV3
        scale: 0.95
      Head:
        name: MultiHead
        head_list:
          - CTCHead:
              Neck:
                name: svtr
                dims: 120
                depth: 2
                hidden_dims: 120
                kernel_size: [1, 3]
                use_guide: True
              Head:
                fc_decay: 0.00001
          - NRTRHead:
              nrtr_dim: 384
              max_text_length: *max_text_length
Loss:
  name: CombinedLoss
  loss_config_list:
  - DistillationDKDLoss:
      weight: 0.1
      model_name_pairs:
      - - Student
        - Teacher
      key: head_out
      multi_head: true
      alpha: 1.0
      beta: 2.0
      dis_head: gtc
      name: dkd
  - DistillationCTCLoss:
      weight: 1.0
      model_name_list:
      - Student
      key: head_out
      multi_head: true
  - DistillationNRTRLoss:
      weight: 1.0
      smoothing: false
      model_name_list:
      - Student
      key: head_out
      multi_head: true
  - DistillCTCLogits:
      weight: 1.0
      reduction: mean
      model_name_pairs:
      - - Student
        - Teacher
      key: head_out
PostProcess:
  name: DistillationCTCLabelDecode
  model_name:
  - Student
  key: head_out
  multi_head: true
Metric:
  name: DistillationMetric
  base_metric_name: RecMetric
  main_indicator: acc
  key: Student
  ignore_space: false
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/lpd_rec
    label_file_list:
    - ./train_data/lpd_rec/train.txt
    ratio_list:
    - 1.0
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecAug:
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - RecResizeImg:
        image_shape: [3, 48, 320]
      
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  loader:
    shuffle: true
    batch_size_per_card: 8
    drop_last: true
    num_workers: 2
    use_shared_memory: true
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/lpd_rec
    label_file_list:
    - ./train_data/lpd_rec/test.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 8
    num_workers: 2
profiler_options: null

请问我要如何修改呢？


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练ppocrv4 报错ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. #11200

我镜像中用相同的数据可以用ch_PP-OCRv4_rec_hgnet.yml配置文件训练，也可以用v3的配置文件训练，只有ch_PP-OCRv4_rec_distill.yml这个配置文件报错。

我采用的ch_PP-OCRv4_rec_distill.yml配置文件的内容如下：

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

训练ppocrv4 报错ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. #11200

Description

我镜像中用相同的数据可以用ch_PP-OCRv4_rec_hgnet.yml配置文件训练，也可以用v3的配置文件训练，只有ch_PP-OCRv4_rec_distill.yml这个配置文件报错。

我采用的ch_PP-OCRv4_rec_distill.yml配置文件的内容如下：

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions