Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is jetpack 6.0 for jetson agx orin supported? #3049

Open
dhruvmsheth opened this issue Jul 31, 2024 · 5 comments
Open

Is jetpack 6.0 for jetson agx orin supported? #3049

dhruvmsheth opened this issue Jul 31, 2024 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@dhruvmsheth
Copy link

I tried installing torch_tensorrt using jetpack 5.0 WORKSPACE script but it did not work for my system which is currently using jetpack 6.0 on the jetson agx orin

@dhruvmsheth dhruvmsheth added the question Further information is requested label Jul 31, 2024
@narendasan
Copy link
Collaborator

IIRC Jetpack 6.0 is still on 8.6 so you may be able to build an older version of TorchTRT (like 2.2) or the NGC-iGPU branch shipped in the NGC containers https://github.com/pytorch/TensorRT/tree/release/ngc/24.07_igpu. You can also use the containers for Jetson directly which already have Torch-TRT installed https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

@areddy2022
Copy link

areddy2022 commented Aug 17, 2024

It would appear that the PyTorch NGC containers do not work on Jetson. Using Dusty's docker image dustynv/torch_tensorrt:r35.4.1, results in this error when importing torch_tensorrt.

Python 3.8.10 (default, Jul 29 2024, 17:02:10) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch_tensorrt
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/__init__.py", line 39, in <module>
    import tensorrt
  File "/usr/lib/python3.8/dist-packages/tensorrt/__init__.py", line 68, in <module>
    from .tensorrt import *
ImportError: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /usr/lib/aarch64-linux-gnu/nvidia/libnvos.so)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/__init__.py", line 82, in <module>
    ctypes.CDLL(_find_lib(lib, LINUX_PATHS))
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /usr/lib/aarch64-linux-gnu/nvidia/libnvos.so)

@narendasan
Copy link
Collaborator

iirc you can use this container: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags. The iGPU tag I believe is targeted at jetson cc: @apbose

@areddy2022
Copy link

areddy2022 commented Aug 21, 2024

EDIT:
The iGPU container from NGC is viable and has a working torch tensorrt, but flash attention seems to fail to compile correctly.


When attempting to run a model from the HuggingFace Transformers library (OpenVLA), this error occurs. It appears to be a flash attention related issue, but I also compiled flash attention from source:
RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Here is the entire stack trace:
INFO:__main__:Loading processor and model from openvla/openvla-7b The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2"instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU withmodel.to('cuda'). WARNING:transformers_modules.openvla.openvla-7b.e5822cc24559b04e532f49b1c1ddb64376c1a485.modeling_prismatic:Expected transformers==4.40.1andtokenizers==0.19.1but gottransformers==4.44.1andtokenizers==0.19.1; there might be inference-time regressions due to dependency changes. If in doubt, pleaseuse the above versions. You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 4.42it/s] generation_config.json: 100%|██████████████████████████████████████████████████████████████████| 136/136 [00:00<00:00, 716kB/s] INFO:__main__:Model loaded on device: cuda:0 INFO:__main__:Starting benchmark We detected that you are passing past_key_valuesas a tuple and this is deprecated and will be removed in v4.43. Please use an appropriateCacheclass (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache) Traceback (most recent call last): File "/workspace/bit-VLA/inferenceV2.py", line 79, in <module> main() File "/workspace/bit-VLA/inferenceV2.py", line 63, in main generation_times = run_benchmark(processor, vla, image, prompt) File "/workspace/bit-VLA/inferenceV2.py", line 44, in run_benchmark action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False) File "/root/.cache/huggingface/modules/transformers_modules/openvla/openvla-7b/e5822cc24559b04e532f49b1c1ddb64376c1a485/modeling_prismatic.py", line 517, in predict_action generated_ids = self.generate(input_ids, max_new_tokens=self.get_action_dim(unnorm_key), **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate result = self._sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample outputs = self(**model_inputs, return_dict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/openvla/openvla-7b/e5822cc24559b04e532f49b1c1ddb64376c1a485/modeling_prismatic.py", line 404, in forward language_model_output = self.language_model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1001, in forward layer_outputs = decoder_layer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 734, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 556, in forward attn_output = _flash_attention_forward( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 296, in _flash_attention_forward attn_output = flash_attn_func( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 880, in flash_attn_func return FlashAttnFunc.apply( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 573, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 546, in forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_forward( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 52, in _flash_attn_forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd( RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile withTORCH_USE_CUDA_DSAto enable device-side assertions.

@narendasan
Copy link
Collaborator

Does flash attention have any standalone tests you can run to verify your build?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants