作者您好，我也遇到了确定性算法的警告导致模型不能运行，同评论区的问题一样，我是4张3090一起训练de设置了0,1,2,3.指定单卡直接显示cuda错误，希望您能给出建议

segment/train: weights=/media/dell/lhx/yolo/ASF-YOLO/yolov5l-seg.pt, cfg=/media/dell/lhx/yolo/ASF-YOLO/models/segment/asf-yolo.yaml, data=/media/dell/lhx/yolo/ASF-YOLO/data/bcc.yaml, hyp=/media/dell/lhx/yolo/ASF-YOLO/data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=8, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0,1,2,3, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=../runs_2/train-seg, name=improve, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, mask_ratio=4, no_overlap=False
YOLOv5  2024-5-30 Python-3.8.0 torch-2.3.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3090, 24260MiB)
                                                   CUDA:1 (NVIDIA GeForce RTX 3090, 24260MiB)
                                                   CUDA:2 (NVIDIA GeForce RTX 3090, 24260MiB)
                                                   CUDA:3 (NVIDIA GeForce RTX 3090, 24260MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir ../runs_2/train-seg', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=1

                 from  n    params  module                                  arguments                     
  0                -1  1      7040  models.common.Conv                      [3, 64, 6, 2, 2]              
  1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  2                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  4                -1  6   1118208  models.common.C3                        [256, 256, 6]                 
  5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  6                -1  9   6433792  models.common.C3                        [512, 512, 9]                 
  7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
  8                -1  3   9971712  models.common.C3                        [1024, 1024, 3]               
  9                -1  1   2624512  models.common.SPPF                      [1024, 1024, 5]               
 10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 11                 4  1    132096  models.common.Conv                      [256, 512, 1, 1]              
 12       [-1, 6, -2]  1         0  models.common.Zoom_cat                  [512]                         
 13                -1  3   3019776  models.common.C3                        [1536, 512, 3, False]         
 14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 15                 2  1     33280  models.common.Conv                      [128, 256, 1, 1]              
 16       [-1, 4, -2]  1         0  models.common.Zoom_cat                  [256]                         
 17                -1  3    756224  models.common.C3                        [768, 256, 3, False]          
 18                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  3   2495488  models.common.C3                        [512, 512, 3, False]          
 21                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  3   9971712  models.common.C3                        [1024, 1024, 3, False]        
 24         [4, 6, 8]  1    460544  models.common.ScalSeq                   [256]                         
 25          [17, -1]  1     12325  models.common.attention_model           [256]                         
 26      [-1, 20, 23]  1   1393558  models.yolo.Segment                     [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], 32, 256, [256, 512, 1024]]
asf-yolo summary: 407 layers, 48465467 parameters, 48465467 gradients, 155.4 GFLOPs

Transferred 602/671 items from /media/dell/lhx/yolo/ASF-YOLO/yolov5l-seg.pt
AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 110 weight(decay=0.0), 116 weight(decay=0.0005), 114 bias
WARNING ⚠️ DP not recommended, use torch.distributed.run for best DDP Multi-GPU results.
See Multi-GPU Tutorial at https://github.com/ultralytics/yolov5/issues/475 to get started.
train: Scanning /media/dell/lhx/yolo/ASF-YOLO/datasets/BCC/labels/train.cache... 128 images, 0 backgrounds, 0 corrupt: 100%|██████████| 128/128 00:00
val: Scanning /media/dell/lhx/yolo/ASF-YOLO/datasets/BCC/labels/val.cache... 32 images, 0 backgrounds, 0 corrupt: 100%|██████████| 32/32 00:00

AutoAnchor: 4.36 anchors/target, 0.970 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve...
AutoAnchor: WARNING ⚠️ Extremely small objects found: 47 of 1235 labels are <3 pixels in size
AutoAnchor: Running kmeans for 9 anchors on 1235 points...
AutoAnchor: Evolving anchors with Genetic Algorithm: fitness = 0.7403: 100%|██████████| 1000/1000 00:00
AutoAnchor: thr=0.25: 0.9571 best possible recall, 6.31 anchors past thr
AutoAnchor: n=9, img_size=640, metric_all=0.391/0.743-mean/best, past_thr=0.495-mean: 25,43, 88,52, 51,155, 92,121, 163,129, 116,183, 236,232, 160,418, 350,452
AutoAnchor: Done ⚠️ (original anchors better than new anchors, proceeding with original anchors)
Plotting labels to ../runs_2/train-seg/improve3/labels.jpg... 
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to ../runs_2/train-seg/improve3
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   seg_loss   obj_loss   cls_loss  Instances       Size
  0%|          | 0/16 00:03
Traceback (most recent call last):
  File "/media/dell/lhx/yolo/ASF-YOLO/segment/train.py", line 658, in <module>
    main(opt)
  File "/media/dell/lhx/yolo/ASF-YOLO/segment/train.py", line 554, in main
    train(opt.hyp, opt, device, callbacks)
  File "/media/dell/lhx/yolo/ASF-YOLO/segment/train.py", line 317, in train
    scaler.scale(loss).backward()
  File "/home/leihaoxiang/.conda/envs/yolo/lib/python3.8/site-packages/torch/_tensor.py", line 525, in backward
    torch.autograd.backward(
  File "/home/leihaoxiang/.conda/envs/yolo/lib/python3.8/site-packages/torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "/home/leihaoxiang/.conda/envs/yolo/lib/python3.8/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: max_pool3d_with_indices_backward_cuda does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

进程已结束,退出代码1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

作者您好，我也遇到了确定性算法的警告导致模型不能运行，同评论区的问题一样，我是4张3090一起训练de设置了0,1,2,3.指定单卡直接显示cuda错误，希望您能给出建议 #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

作者您好，我也遇到了确定性算法的警告导致模型不能运行，同评论区的问题一样，我是4张3090一起训练de设置了0,1,2,3.指定单卡直接显示cuda错误，希望您能给出建议 #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions