intel gpu : enable intel gpu #79

xiaowangintel · 2024-01-10T03:15:01Z

This PR adds the initial Intel GPU support in GPT-fast with the device option "xpu" (i.e., --device "xpu"). Both single device and multi-device via tensor parallel are supported functionally while performance is still being improved. Refer to the following steps to run the generation on Intel GPU. We will update the tutorial later with improved performance.

Installation

Install pytorch and Intel® Extension for PyTorch:
https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/introduction.html#
install oneCCL for distributed:
https://github.com/oneapi-src/oneCCL
install Intel® Extension for Triton (needed by torch.compile):
https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/features/torch_compile_gpu.html

How to run gpt-fast code on intel GPUs?

command for single device:
python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --speculate_k 5 --prompt "Hi my name is" --device xpu
command for multiple devices via Tensor Parallelism:
ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=2 generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --device xpu

Note:

Please export UR_L0_IN_ORDER_BARRIER_BY_SIGNAL=0, a temporary configuration, to avoid unnecessary errors, when runs gpt-fast code with torch.compile.
Please export IPEX_ZE_TRACING=1, a temporary configuration, to get event, when runs gpt-fast code with profile.
Currently, only bf16 is supported, and int4/int8 will be supported later via IPEX without requiring code changes in gpt-fast.

jgong5 · 2024-01-10T03:30:24Z

tp.py

+        os.environ['CCL_PROCESS_LAUNCHER'] = 'none'
+        os.environ['CCL_LOCAL_SIZE'] = str(world_size)
+        os.environ['CCL_LOCAL_RANK'] = str(rank)
+
+        torch.xpu.set_device(rank)
+        dist.init_process_group(backend="ccl", rank=rank, world_size=world_size)


Please move these lines inside "try" region.

Yeah, I do that.

jgong5 · 2024-01-10T03:32:45Z

generate.py

+try:
+    import intel_extension_for_pytorch as ipex
+except:
+    pass


Suggest to move this into main and make it a conditional import when the user selects "xpu" device. Raise error when things go wrong.

Yeah, I do that.

jgong5 · 2024-01-10T03:33:53Z

generate.py

 torch._inductor.config.coordinate_descent_tuning = True
 torch._inductor.config.triton.unique_kernel_names = True
-torch._inductor.config.fx_graph_cache = True # Experimental feature to reduce compilation times, will be on by default in future
+if hasattr(torch._inductor.config, "fx_graph_cache"):


Intel GPU currently uses a PyTorch fork based on 2.1 which doesn't have fx_graph_cache yet.

I've added a comment.

jgong5 · 2024-01-10T03:35:42Z

generate.py

+                    record_shapes=True,
+                    profile_memory=False,
+                    with_stack=True


Can we remove these extra configurations?

Yes, it was removed.

jgong5 · 2024-01-10T03:42:39Z

Please add to the PR description 1) how to build/install the pre-requisite software components; 2) how to run inference with and without tensor parallel.

jgong5 · 2024-01-11T05:28:40Z

generate.py

 torch._inductor.config.coordinate_descent_tuning = True
 torch._inductor.config.triton.unique_kernel_names = True
-torch._inductor.config.fx_graph_cache = True # Experimental feature to reduce compilation times, will be on by default in future
+#Intel GPU currently uses a PyTorch fork based on 2.1 which doesn't have fx_graph_cache yet.


Suggested change

#Intel GPU currently uses a PyTorch fork based on 2.1 which doesn't have fx_graph_cache yet.

# To support devices (like Intel GPU) which still use PyTorch 2.1 that doesn't have fx_graph_cache yet.

jgong5 · 2024-01-11T06:55:53Z

generate.py

+        try:
+            import intel_extension_for_pytorch as ipex
+        except:
+            raise ModuleNotFoundError(f"No module named 'intel_extension_for_pytorch'")


Suggested change

raise ModuleNotFoundError(f"No module named 'intel_extension_for_pytorch'")

raise ModuleNotFoundError("Intel Extension for PyTorch (intel_extension_for_pytorch) is required to run Intel GPU on the XPU device. Please check https://github.com/intel/intel-extension-for-pytorch for details.")

jgong5 · 2024-01-11T07:00:32Z

tp.py

+        try:
+            import oneccl_bindings_for_pytorch
+        except:
+            raise ModuleNotFoundError(f"No module named 'oneccl_bindings_for_pytorch'")


Suggested change

raise ModuleNotFoundError(f"No module named 'oneccl_bindings_for_pytorch'")

raise ModuleNotFoundError(f"OneCCL bindings for PyTorch (oneccl_bindings_for_pytorch) is required to run tensor parallel on Intel GPU (XPU). Please check https://github.com/intel/torch-ccl for details.")

jgong5 · 2024-01-11T07:01:35Z

tp.py

+        try:
+            os.environ['CCL_PROCESS_LAUNCHER'] = 'none'
+            os.environ['CCL_LOCAL_SIZE'] = str(world_size)
+            os.environ['CCL_LOCAL_RANK'] = str(rank)
+
+            torch.xpu.set_device(rank)
+            dist.init_process_group(backend="ccl", rank=rank, world_size=world_size)
+        except:
+            raise ValueError(f"Failed to initialize 'ccl'")


do we need try-catch here? CUDA doesn't need try-catch.

jgong5 · 2024-01-12T12:35:30Z

@Chillee This is the initial PR to support Intel GPU. Most needed code changes should be there. Further performance optimizations will be applied inside IPEX. May I ask your review? Thanks!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 10, 2024

jgong5 suggested changes Jan 10, 2024

View reviewed changes

jgong5 suggested changes Jan 11, 2024

View reviewed changes

jgong5 approved these changes Jan 12, 2024

View reviewed changes

xiaowangintel force-pushed the main branch 4 times, most recently from 0eaf4b5 to 3af7b93 Compare March 19, 2024 02:42

xiaowangintel force-pushed the main branch 2 times, most recently from 4ccebef to ff223c8 Compare May 6, 2024 08:48

intel gpu : enable intel gpu

1be8739

xiaowangintel force-pushed the main branch from ff223c8 to 1be8739 Compare June 14, 2024 07:54

dbyoung18 mentioned this pull request Aug 27, 2024

Enable Intel GPU pytorch/ao#753

Closed

	#Intel GPU currently uses a PyTorch fork based on 2.1 which doesn't have fx_graph_cache yet.
	# To support devices (like Intel GPU) which still use PyTorch 2.1 that doesn't have fx_graph_cache yet.

	raise ModuleNotFoundError(f"No module named 'intel_extension_for_pytorch'")
	raise ModuleNotFoundError("Intel Extension for PyTorch (intel_extension_for_pytorch) is required to run Intel GPU on the XPU device. Please check https://github.com/intel/intel-extension-for-pytorch for details.")

	raise ModuleNotFoundError(f"No module named 'oneccl_bindings_for_pytorch'")
	raise ModuleNotFoundError(f"OneCCL bindings for PyTorch (oneccl_bindings_for_pytorch) is required to run tensor parallel on Intel GPU (XPU). Please check https://github.com/intel/torch-ccl for details.")

Uh oh!

intel gpu : enable intel gpu #79

Are you sure you want to change the base?

intel gpu : enable intel gpu #79

Uh oh!

Conversation

xiaowangintel commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaowangintel Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgong5 commented Jan 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgong5 commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiaowangintel commented Jan 10, 2024 •

edited

Loading

xiaowangintel Jan 11, 2024 •

edited

Loading

jgong5 commented Jan 12, 2024 •

edited

Loading