forked from facebookresearch/ijepa
-
Notifications
You must be signed in to change notification settings - Fork 0
/
slurm-2570.out
48 lines (48 loc) · 2.49 KB
/
slurm-2570.out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
INFO:root:called-params configs/in1k_vitL14_ep300.yaml
INFO:root:loaded params...
[W socket.cpp:464] [c10d] The server socket has failed to bind to [::]:40112 (errno: 98 - Address already in use).
[W socket.cpp:464] [c10d] The server socket has failed to bind to 0.0.0.0:40112 (errno: 98 - Address already in use).
[E socket.cpp:500] [c10d] The server socket has failed to listen on any local network address.
{ 'data': { 'batch_size': 128,
'color_jitter_strength': 0.0,
'crop_scale': [0.3, 1.0],
'crop_size': 224,
'image_folder': 'inet-1k/',
'num_workers': 16,
'pin_mem': True,
'root_path': '/home/rtcalumby/adam/luciano/',
'use_color_distortion': False,
'use_gaussian_blur': False,
'use_horizontal_flip': False},
'logging': { 'folder': '/home/rtcalumby/adam/luciano/LifeCLEFPlant2022/logs/imagenet_vit_L',
'write_tag': 'jepa'},
'mask': { 'allow_overlap': False,
'aspect_ratio': [0.75, 1.5],
'enc_mask_scale': [0.85, 1.0],
'min_keep': 10,
'num_enc_masks': 1,
'num_pred_masks': 4,
'patch_size': 14,
'pred_mask_scale': [0.15, 0.2]},
'meta': { 'checkpoint_file': None,
'copy_data': False,
'load_checkpoint': True,
'model_name': 'vit_large',
'pred_depth': 12,
'pred_emb_dim': 384,
'read_checkpoint': None,
'use_bfloat16': True},
'optimization': { 'ema': [0.996, 1.0],
'epochs': 300,
'final_lr': 1e-06,
'final_weight_decay': 0.4,
'ipe_scale': 1.0,
'lr': 0.001,
'start_lr': 0.0001,
'warmup': 15,
'weight_decay': 0.04}}
INFO:root:distributed training not available The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:40112 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:40112 (errno: 98 - Address already in use).
INFO:root:Running... (rank: 0/1)
INFO:root:SLURM vars not set (distributed training not available)
INFO:root:Initialized (rank/world-size) 0/1
slurmstepd: error: *** JOB 2570 ON hgx CANCELLED AT 2024-07-09T14:28:36 ***