Is jetpack 6.0 for jetson agx orin supported? #3049

dhruvmsheth · 2024-07-31T03:06:24Z

I tried installing torch_tensorrt using jetpack 5.0 WORKSPACE script but it did not work for my system which is currently using jetpack 6.0 on the jetson agx orin

narendasan · 2024-07-31T20:29:48Z

IIRC Jetpack 6.0 is still on 8.6 so you may be able to build an older version of TorchTRT (like 2.2) or the NGC-iGPU branch shipped in the NGC containers https://github.com/pytorch/TensorRT/tree/release/ngc/24.07_igpu. You can also use the containers for Jetson directly which already have Torch-TRT installed https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

areddy2022 · 2024-08-17T09:36:27Z

It would appear that the PyTorch NGC containers do not work on Jetson. Using Dusty's docker image dustynv/torch_tensorrt:r35.4.1, results in this error when importing torch_tensorrt.

Python 3.8.10 (default, Jul 29 2024, 17:02:10) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch_tensorrt
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/__init__.py", line 39, in <module>
    import tensorrt
  File "/usr/lib/python3.8/dist-packages/tensorrt/__init__.py", line 68, in <module>
    from .tensorrt import *
ImportError: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /usr/lib/aarch64-linux-gnu/nvidia/libnvos.so)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/__init__.py", line 82, in <module>
    ctypes.CDLL(_find_lib(lib, LINUX_PATHS))
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /usr/lib/aarch64-linux-gnu/nvidia/libnvos.so)

narendasan · 2024-08-20T16:33:21Z

iirc you can use this container: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags. The iGPU tag I believe is targeted at jetson cc: @apbose

areddy2022 · 2024-08-21T20:00:54Z

EDIT:
The iGPU container from NGC is viable and has a working torch tensorrt, but flash attention seems to fail to compile correctly.

When attempting to run a model from the HuggingFace Transformers library (OpenVLA), this error occurs. It appears to be a flash attention related issue, but I also compiled flash attention from source:
RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Here is the entire stack trace:
INFO:__main__:Loading processor and model from openvla/openvla-7b The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2"instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU withmodel.to('cuda'). WARNING:transformers_modules.openvla.openvla-7b.e5822cc24559b04e532f49b1c1ddb64376c1a485.modeling_prismatic:Expected transformers==4.40.1andtokenizers==0.19.1but gottransformers==4.44.1andtokenizers==0.19.1; there might be inference-time regressions due to dependency changes. If in doubt, pleaseuse the above versions. You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 4.42it/s] generation_config.json: 100%|██████████████████████████████████████████████████████████████████| 136/136 [00:00<00:00, 716kB/s] INFO:__main__:Model loaded on device: cuda:0 INFO:__main__:Starting benchmark We detected that you are passing past_key_valuesas a tuple and this is deprecated and will be removed in v4.43. Please use an appropriateCacheclass (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache) Traceback (most recent call last): File "/workspace/bit-VLA/inferenceV2.py", line 79, in <module> main() File "/workspace/bit-VLA/inferenceV2.py", line 63, in main generation_times = run_benchmark(processor, vla, image, prompt) File "/workspace/bit-VLA/inferenceV2.py", line 44, in run_benchmark action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False) File "/root/.cache/huggingface/modules/transformers_modules/openvla/openvla-7b/e5822cc24559b04e532f49b1c1ddb64376c1a485/modeling_prismatic.py", line 517, in predict_action generated_ids = self.generate(input_ids, max_new_tokens=self.get_action_dim(unnorm_key), **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate result = self._sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample outputs = self(**model_inputs, return_dict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/openvla/openvla-7b/e5822cc24559b04e532f49b1c1ddb64376c1a485/modeling_prismatic.py", line 404, in forward language_model_output = self.language_model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1001, in forward layer_outputs = decoder_layer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 734, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 556, in forward attn_output = _flash_attention_forward( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 296, in _flash_attention_forward attn_output = flash_attn_func( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 880, in flash_attn_func return FlashAttnFunc.apply( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 573, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 546, in forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_forward( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 52, in _flash_attn_forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd( RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile withTORCH_USE_CUDA_DSAto enable device-side assertions.

narendasan · 2024-09-12T21:11:38Z

Does flash attention have any standalone tests you can run to verify your build?

dhruvmsheth added the question Further information is requested label Jul 31, 2024

narendasan assigned lanluo-nvidia Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is jetpack 6.0 for jetson agx orin supported? #3049

Is jetpack 6.0 for jetson agx orin supported? #3049

dhruvmsheth commented Jul 31, 2024

narendasan commented Jul 31, 2024

areddy2022 commented Aug 17, 2024 •

edited

Loading

narendasan commented Aug 20, 2024

areddy2022 commented Aug 21, 2024 •

edited

Loading

narendasan commented Sep 12, 2024

Is jetpack 6.0 for jetson agx orin supported? #3049

Is jetpack 6.0 for jetson agx orin supported? #3049

Comments

dhruvmsheth commented Jul 31, 2024

narendasan commented Jul 31, 2024

areddy2022 commented Aug 17, 2024 • edited Loading

narendasan commented Aug 20, 2024

areddy2022 commented Aug 21, 2024 • edited Loading

narendasan commented Sep 12, 2024

areddy2022 commented Aug 17, 2024 •

edited

Loading

areddy2022 commented Aug 21, 2024 •

edited

Loading