torch problem #32

tianqibucuo0 · 2023-05-18T03:24:45Z

my cuda version is 11.7, but cuda version is 8.0 in DeepRule.txt, could i download 11.7?

soap117 · 2023-05-18T06:55:06Z

I have add the new environment file see updates and is able to complie cpools layers

tianqibucuo0 · 2023-05-18T07:08:16Z

thank you very much！

tianqibucuo0 · 2023-05-18T07:18:00Z

hello, requirement-2023.txt have 33 packages, but DeepRule.txt have 96 packages, other packages not need download?

soap117 · 2023-06-16T21:33:26Z

Generally not I have tested it, if found someone is missing, just install it.

tianqibucuo0 · 2023-06-21T02:02:28Z

Hello, I am training a model using "linedata(1028)" and encountered two errors. Could you please help me？
1、DeepRule-master/models/py_utils/kp_utils.py:592: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ../aten/src/ATen/native/cuda/Indexing.cu:1239.) tag_full[1-mask_full] = 0
2、python3.9/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret))
Segmentation fault (core dumped)

soap117 · 2023-06-21T22:33:13Z

For first I think you can use type_as to torch.float32 before the masked_fill_ command

tianqibucuo0 · 2023-06-27T07:42:40Z

Thank you, after fixing all the UserWarning errors, I encountered the error "Segmentation fault (core dumped)" during the execution. Here is my execution process. Can you please explain why this is happening?

(DeepRule) sun@sun:~/DeepRule-master$ python train_chart.py --cfg_file CornerNetLine --data_dir "/home/sun/data/linedata(1028)" --cache_path "/home/sun/data/linedata(1028)/cache/"
:228: RuntimeWarning: compiletime version 3.6 of module 'pycocotools._mask' does not match runtime version 3.9
:228: RuntimeWarning: builtins.type size changed, may indicate binary incompatibility. Expected 864 from C header, got 880 from PyObject
./config/CornerNetLine.json
['cache', 'line']
loading all datasets...
using 1 threads
loading from cache file: /home/sun/data/linedata(1028)/cache/line_train2019.pkl
loading annotations into memory...
/home/sun/data/linedata(1028)/line/annotations/instancesLine(1023)_train2019.json
Done (t=2.72s)
creating index...
index created!
loading from cache file: /home/sun/data/linedata(1028)/cache/line_val2019.pkl
loading annotations into memory...
/home/sun/data/linedata(1028)/line/annotations/instancesLine(1023)_val2019.json
Done (t=0.05s)
creating index...
index created!
system config...
{'batch_size': 5,
'cache_dir': '/home/sun/yangshaohan/618/data/linedata(1028)/cache/',
'chunk_sizes': [5, 7, 7, 7],
'config_dir': './config',
'data_dir': '/home/sun/yangshaohan/618/data/linedata(1028)',
'data_rng': RandomState(MT19937) at 0x7FE69C7CB340,
'dataset': 'Line',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 50000,
'nnet_rng': RandomState(MT19937) at 0x7FE69C7CB440,
'opt_algo': 'adam',
'prefetch_size': 5,
'pretrain': None,
'result_dir': './results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CornerNetLine',
'stepsize': 45000,
'tar_data_dir': 'cls',
'test_split': 'testchart',
'train_split': 'trainchart',
'val_iter': 100,
'val_split': 'valchart',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 1,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.3,
'gaussian_radius': -1,
'input_size': [511, 511],
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 200,
'weight_exp': 8}
len of db: 116745
building model...
module_file: models.CornerNetLine
use kp
total parameters: 198592138
setting learning rate to: 0.00025
training start...
start prefetching data...
shuffling indices...
['read.txt']
0%| | 0/50000 [00:00<?, ?it/s]
Segmentation fault (core dumped)

soap117 · 2023-06-27T17:08:29Z

Sounds like the Cornernet package problem. Follow the instructions to compile it.

tianqibucuo0 · 2023-07-01T01:42:06Z

Hello, after recompiling, the same problem still persists. Could you please provide the versions of Python, CUDA, and GCC specified in the requirements-2023.txt file? Additionally, I would like to know the amount of GPU memory required for training "line" model.

soap117 · 2023-07-01T03:27:41Z

Package Version

adal 1.2.7
argcomplete 2.1.2
azure-common 1.1.28
azure-core 1.27.1
azure-graphrbac 0.61.1
azure-mgmt-authorization 3.0.0
azure-mgmt-containerregistry 10.1.0
azure-mgmt-core 1.4.0
azure-mgmt-keyvault 10.2.2
azure-mgmt-resource 22.0.0
azure-mgmt-storage 21.0.0
azureml 0.2.7
azureml-core 1.52.0
backports.tempfile 1.0
backports.weakref 1.0.post1
bcrypt 4.0.1
certifi 2023.5.7
cffi 1.15.1
charset-normalizer 3.1.0
contextlib2 21.6.0
contourpy 1.0.5
cryptography 41.0.1
cycler 0.11.0
docker 6.1.3
fonttools 4.25.0
h5py 3.8.0
humanfriendly 10.0
idna 3.4
importlib-resources 5.2.0
isodate 0.6.1
jeepney 0.8.0
jmespath 1.0.1
jsonpickle 3.0.1
kiwisolver 1.4.4
knack 0.10.1
matplotlib 3.7.1
mkl-fft 1.3.6
mkl-random 1.2.2
mkl-service 2.4.0
msal 1.22.0
msal-extensions 1.0.0
msrest 0.7.1
msrestazure 0.6.4
munkres 1.1.4
ndg-httpsclient 0.5.1
numpy 1.24.3
oauthlib 3.2.2
opencv-python 4.7.0.72
packaging 23.0
pandas 2.0.3
paramiko 3.2.0
pathspec 0.11.1
Pillow 9.4.0
pip 23.0.1
pkginfo 1.9.6
ply 3.11
portalocker 2.7.0
pyasn1 0.5.0
pycparser 2.21
Pygments 2.15.1
PyJWT 2.7.0
PyNaCl 1.5.0
pyOpenSSL 23.2.0
pyparsing 3.0.9
PyQt5-sip 12.11.0
PySocks 1.7.1
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0
requests 2.30.0
requests-oauthlib 1.3.1
SecretStorage 3.3.3
setuptools 66.0.0
sip 6.6.2
six 1.16.0
tabulate 0.9.0
toml 0.10.2
torch 1.7.1+cu110
torchaudio 0.7.2
torchvision 0.8.2+cu110
tornado 6.2
typing_extensions 4.5.0
tzdata 2023.3
urllib3 1.26.16
websocket-client 1.6.1
wheel 0.38.4
I am able to run the train code

tianqibucuo0 · 2023-07-01T04:34:19Z

Thank you for your response. There is no information available here regarding Python, CUDA, and GCC, which could be due to different versions. Could you please provide me with the relevant information? ysh ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: "soap117/DeepRule" ***@***.***>; 发送时间: 2023年7月1日(星期六) 中午11:27 ***@***.***>; ***@***.******@***.***>; 主题: Re: [soap117/DeepRule] torch problem (Issue #32) Package Version adal 1.2.7 argcomplete 2.1.2 azure-common 1.1.28 azure-core 1.27.1 azure-graphrbac 0.61.1 azure-mgmt-authorization 3.0.0 azure-mgmt-containerregistry 10.1.0 azure-mgmt-core 1.4.0 azure-mgmt-keyvault 10.2.2 azure-mgmt-resource 22.0.0 azure-mgmt-storage 21.0.0 azureml 0.2.7 azureml-core 1.52.0 backports.tempfile 1.0 backports.weakref 1.0.post1 bcrypt 4.0.1 certifi 2023.5.7 cffi 1.15.1 charset-normalizer 3.1.0 contextlib2 21.6.0 contourpy 1.0.5 cryptography 41.0.1 cycler 0.11.0 docker 6.1.3 fonttools 4.25.0 h5py 3.8.0 humanfriendly 10.0 idna 3.4 importlib-resources 5.2.0 isodate 0.6.1 jeepney 0.8.0 jmespath 1.0.1 jsonpickle 3.0.1 kiwisolver 1.4.4 knack 0.10.1 matplotlib 3.7.1 mkl-fft 1.3.6 mkl-random 1.2.2 mkl-service 2.4.0 msal 1.22.0 msal-extensions 1.0.0 msrest 0.7.1 msrestazure 0.6.4 munkres 1.1.4 ndg-httpsclient 0.5.1 numpy 1.24.3 oauthlib 3.2.2 opencv-python 4.7.0.72 packaging 23.0 pandas 2.0.3 paramiko 3.2.0 pathspec 0.11.1 Pillow 9.4.0 pip 23.0.1 pkginfo 1.9.6 ply 3.11 portalocker 2.7.0 pyasn1 0.5.0 pycparser 2.21 Pygments 2.15.1 PyJWT 2.7.0 PyNaCl 1.5.0 pyOpenSSL 23.2.0 pyparsing 3.0.9 PyQt5-sip 12.11.0 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2023.3 PyYAML 6.0 requests 2.30.0 requests-oauthlib 1.3.1 SecretStorage 3.3.3 setuptools 66.0.0 sip 6.6.2 six 1.16.0 tabulate 0.9.0 toml 0.10.2 torch 1.7.1+cu110 torchaudio 0.7.2 torchvision 0.8.2+cu110 tornado 6.2 typing_extensions 4.5.0 tzdata 2023.3 urllib3 1.26.16 websocket-client 1.6.1 wheel 0.38.4 I am able to run the train code — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

tianqibucuo0 · 2023-07-14T08:18:58Z

Hello, my GPU is relatively small, so I modified train.json and val.json files to keep only 10 data entries for testing purposes. However, when it reaches the line "training = pinned_training_queue.get(block=True)", the execution gets stuck and does not proceed. Below is my execution process. Can you please tell me the reason for this?

/home/ubuntu/anaconda3/envs/myenv/bin/python /home/ubuntu/download/pycharm-community-2023.1.4/plugins/python-ce/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 44227 --file /media/ubuntu/A4823F1E823EF480/2023/env/python/DeepRule-master-weixiugai/DeepRule-master/train_chart.py
Connected to pydev debugger (build 231.9225.15)
/home/ubuntu/anaconda3/envs/myenv/lib/python3.6/site-packages/OpenSSL/_util.py:6: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography will remove support for Python 3.6.
from cryptography.hazmat.bindings.openssl.binding import Binding
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (pyOpenSSL 23.2.0 (/home/ubuntu/anaconda3/envs/myenv/lib/python3.6/site-packages), Requirement.parse('pyopenssl<23.0.0')).
/media/ubuntu/A4823F1E823EF480/2023/env/python/DeepRule-master-weixiugai/DeepRule-master/train_chart.py:22: FutureWarning: azureml.core: AzureML support for Python 3.6 is deprecated and will be dropped in an upcoming release. At that point, existing Python 3.6 workflows that use AzureML will continue to work without modification, but Python 3.6 users will no longer get access to the latest AzureML features and bugfixes. We recommend that you upgrade to Python 3.7 or newer. To disable SDK V1 deprecation warning set the environment variable AZUREML_DEPRECATE_WARNING to 'False'
from azureml.core.run import Run
['line']
loading all datasets...
using 1 threads
loading from cache file: /media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line/line_train2019.pkl
loading annotations into memory...
/media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line/annotations/instancesLine(1023)_train2019.json
Done (t=0.00s)
creating index...
index created!
loading from cache file: /media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line/line_val2019.pkl
loading annotations into memory...
/media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line/annotations/instancesLine(1023)_val2019.json
Done (t=0.00s)
creating index...
index created!
system config...
{'batch_size': 5,
'cache_dir': '/media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line',
'chunk_sizes': [5, 7, 7, 7],
'config_dir': './config',
'data_dir': '/media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)',
'data_rng': RandomState(MT19937) at 0x7FCC248FF258,
'dataset': 'Line',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.01,
'max_iter': 50000,
'nnet_rng': RandomState(MT19937) at 0x7FCC248FF570,
'opt_algo': 'adam',
'prefetch_size': 5,
'pretrain': None,
'result_dir': './results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CornerNetLine',
'stepsize': 45000,
'tar_data_dir': 'cls',
'test_split': 'testchart',
'train_split': 'trainchart',
'val_iter': 100,
'val_split': 'valchart',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 1,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.3,
'gaussian_radius': -1,
'input_size': [511, 511],
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 200,
'weight_exp': 8}
len of db: 11
building model...
module_file: models.CornerNetLine
use kp
total parameters: 198592138
setting learning rate to: 0.01
training start...
start prefetching data...
['read.txt']
0%| | 0/50000 [00:00<?, ?it/s]

LouisPouliot · 2023-09-12T12:18:03Z

I am currently facing a simmilar issue.
Did you manage to find a soultion to this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch problem #32

torch problem #32

tianqibucuo0 commented May 18, 2023

soap117 commented May 18, 2023

tianqibucuo0 commented May 18, 2023

tianqibucuo0 commented May 18, 2023

soap117 commented Jun 16, 2023

tianqibucuo0 commented Jun 21, 2023

soap117 commented Jun 21, 2023

tianqibucuo0 commented Jun 27, 2023

soap117 commented Jun 27, 2023

tianqibucuo0 commented Jul 1, 2023

soap117 commented Jul 1, 2023

tianqibucuo0 commented Jul 1, 2023 via email

tianqibucuo0 commented Jul 14, 2023

LouisPouliot commented Sep 12, 2023

torch problem #32

torch problem #32

Comments

tianqibucuo0 commented May 18, 2023

soap117 commented May 18, 2023

tianqibucuo0 commented May 18, 2023

tianqibucuo0 commented May 18, 2023

soap117 commented Jun 16, 2023

tianqibucuo0 commented Jun 21, 2023

soap117 commented Jun 21, 2023

tianqibucuo0 commented Jun 27, 2023

soap117 commented Jun 27, 2023

tianqibucuo0 commented Jul 1, 2023

soap117 commented Jul 1, 2023

tianqibucuo0 commented Jul 1, 2023 via email

tianqibucuo0 commented Jul 14, 2023

LouisPouliot commented Sep 12, 2023