-
Couldn't load subscription status.
- Fork 565
intel gpu : enable intel gpu #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| os.environ['CCL_PROCESS_LAUNCHER'] = 'none' | ||
| os.environ['CCL_LOCAL_SIZE'] = str(world_size) | ||
| os.environ['CCL_LOCAL_RANK'] = str(rank) | ||
|
|
||
| torch.xpu.set_device(rank) | ||
| dist.init_process_group(backend="ccl", rank=rank, world_size=world_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move these lines inside "try" region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I do that.
generate.py
Outdated
| try: | ||
| import intel_extension_for_pytorch as ipex | ||
| except: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to move this into main and make it a conditional import when the user selects "xpu" device. Raise error when things go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I do that.
| torch._inductor.config.coordinate_descent_tuning = True | ||
| torch._inductor.config.triton.unique_kernel_names = True | ||
| torch._inductor.config.fx_graph_cache = True # Experimental feature to reduce compilation times, will be on by default in future | ||
| if hasattr(torch._inductor.config, "fx_graph_cache"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel GPU currently uses a PyTorch fork based on 2.1 which doesn't have fx_graph_cache yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment.
generate.py
Outdated
| record_shapes=True, | ||
| profile_memory=False, | ||
| with_stack=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove these extra configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was removed.
|
Please add to the PR description 1) how to build/install the pre-requisite software components; 2) how to run inference with and without tensor parallel. |
generate.py
Outdated
| torch._inductor.config.coordinate_descent_tuning = True | ||
| torch._inductor.config.triton.unique_kernel_names = True | ||
| torch._inductor.config.fx_graph_cache = True # Experimental feature to reduce compilation times, will be on by default in future | ||
| #Intel GPU currently uses a PyTorch fork based on 2.1 which doesn't have fx_graph_cache yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #Intel GPU currently uses a PyTorch fork based on 2.1 which doesn't have fx_graph_cache yet. | |
| # To support devices (like Intel GPU) which still use PyTorch 2.1 that doesn't have fx_graph_cache yet. |
generate.py
Outdated
| try: | ||
| import intel_extension_for_pytorch as ipex | ||
| except: | ||
| raise ModuleNotFoundError(f"No module named 'intel_extension_for_pytorch'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| raise ModuleNotFoundError(f"No module named 'intel_extension_for_pytorch'") | |
| raise ModuleNotFoundError("Intel Extension for PyTorch (intel_extension_for_pytorch) is required to run Intel GPU on the XPU device. Please check https://github.com/intel/intel-extension-for-pytorch for details.") |
tp.py
Outdated
| try: | ||
| import oneccl_bindings_for_pytorch | ||
| except: | ||
| raise ModuleNotFoundError(f"No module named 'oneccl_bindings_for_pytorch'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| raise ModuleNotFoundError(f"No module named 'oneccl_bindings_for_pytorch'") | |
| raise ModuleNotFoundError(f"OneCCL bindings for PyTorch (oneccl_bindings_for_pytorch) is required to run tensor parallel on Intel GPU (XPU). Please check https://github.com/intel/torch-ccl for details.") |
tp.py
Outdated
| try: | ||
| os.environ['CCL_PROCESS_LAUNCHER'] = 'none' | ||
| os.environ['CCL_LOCAL_SIZE'] = str(world_size) | ||
| os.environ['CCL_LOCAL_RANK'] = str(rank) | ||
|
|
||
| torch.xpu.set_device(rank) | ||
| dist.init_process_group(backend="ccl", rank=rank, world_size=world_size) | ||
| except: | ||
| raise ValueError(f"Failed to initialize 'ccl'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need try-catch here? CUDA doesn't need try-catch.
|
@Chillee This is the initial PR to support Intel GPU. Most needed code changes should be there. Further performance optimizations will be applied inside IPEX. May I ask your review? Thanks! |
0eaf4b5 to
3af7b93
Compare
4ccebef to
ff223c8
Compare
This PR adds the initial Intel GPU support in GPT-fast with the device option "xpu" (i.e., --device "xpu"). Both single device and multi-device via tensor parallel are supported functionally while performance is still being improved. Refer to the following steps to run the generation on Intel GPU. We will update the tutorial later with improved performance.
Installation
https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/introduction.html#
https://github.com/oneapi-src/oneCCL
https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/features/torch_compile_gpu.html
How to run gpt-fast code on intel GPUs?
python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --speculate_k 5 --prompt "Hi my name is" --device xpu
ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=2 generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --device xpu
Note: