How can I run part of jit compiled function on a different device? #18789

n-gao · 2023-12-03T14:50:12Z

n-gao
Dec 3, 2023

As the title suggests, I want to run part of my program on another device. Specifically, since GPUs are quite bad in SVD, I'd like to do the SVD on the CPU. In Tensorflow and PyTorch it is possible to simply annotate part of a function to run on a different device. However, in JAX I get the following behavior:

x = jnp.array(np.random.normal(size=(4096, 7, 64)))
@jax.jit
@jax.grad
def f(x):
    dev = jax.devices('cpu')[0]
    return jnp.linalg.pinv(jax.device_put(x, dev)).sum()
%time jax.block_until_ready(f(x))
%timeit jax.block_until_ready(f(x))
CPU times: user 2.1 s, sys: 1.78 ms, total: 2.1 s
Wall time: 1.99 s
1.71 s ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using default_device:

x = jnp.array(np.random.normal(size=(4096, 7, 64)))
@jax.jit
@jax.grad
def f(x):
    dev = jax.devices('cpu')[0]
    with jax.default_device(dev):
        return jnp.linalg.pinv(x).sum()
%time jax.block_until_ready(f(x))
%timeit jax.block_until_ready(f(x))
CPU times: user 2.08 s, sys: 15.5 ms, total: 2.09 s
Wall time: 1.98 s
1.7 s ± 2.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Moving the whole program to CPU

x = jnp.array(np.random.normal(size=(4096, 7, 64)))
@jax.jit
@jax.grad
def f(x):
    return jnp.linalg.pinv(x).sum()

dev = jax.devices('cpu')[0]
with jax.default_device(dev):
    %time jax.block_until_ready(f(x))
    %timeit jax.block_until_ready(f(x))
CPU times: user 412 ms, sys: 4.34 ms, total: 416 ms
Wall time: 383 ms
221 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Since I want most of my program to run on the GPU, it is no option to run everything on the CPU. I am aware of host callbacks which could be applied here. Though is there a different solution?

n-gao · 2023-12-03T17:40:44Z

n-gao
Dec 3, 2023
Author

To answer my own question: Simply jit a smaller function with the desired device:

pinv = jax.jit(jnp.linalg.pinv, device=jax.devices('cpu')[0])
@jax.jit
@jax.grad
def f(x):
    return pinv(x).sum()
%time jax.block_until_ready(f(x))
%timeit jax.block_until_ready(f(x))
CPU times: user 467 ms, sys: 49 ms, total: 516 ms
Wall time: 471 ms
220 ms ± 3.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

1 reply

n-gao Dec 3, 2023
Author

Nevermind this is not the answer, it just moves the whole computation to the CPU

pinv = jax.jit(jnp.linalg.pinv, device=jax.devices('cpu')[0])
@functools.partial(jax.jit, device=jax.devices('gpu')[0])
@jax.grad
def f(x: jax.Array):
    return pinv(x).sum()
f(x)
ValueError: Received incompatible devices for jitted computation. Got argument x of f with shape float32[4096,7,64] and device ids [0] on platform GPU and pjit inside jit with device ids [0] on platform CPU at [/tmp/ipykernel_2150773/1067263298.py:6:11](https://vscode-remote+ssh-002dremote-002bgpu06.vscode-resource.vscode-cdn.net/tmp/ipykernel_2150773/1067263298.py:6:11) (f)

f0uriest · 2023-12-03T22:08:41Z

f0uriest
Dec 3, 2023

I've been having similar issues, I ended up wrapping the CPU part in a pure_callback and it seems to work but not totally sure

0 replies

jakevdp · 2023-12-04T20:32:12Z

jakevdp
Dec 4, 2023
Maintainer

In general, I believe the most general way to achieve this is to use jax.with_sharding_constraint With this, you can tell the compiler which device(s) you would like your data to live on, and then simple operations on that data will be performed on the appropriate device.

Unfortunately we don't have great docs on this just yet, but you can see some examples in the shard_map design doc: https://jax.readthedocs.io/en/latest/jep/14273-shard-map.html

For example:

import jax
import jax.numpy as jnp
import numpy as np

cuda_devices = jax.devices('cuda')
cpu_devices = jax.devices('cpu')

gpu_sharding = jax.sharding.SingleDeviceSharding(cuda_devices[0])
cpu_sharding = jax.sharding.SingleDeviceSharding(cpu_devices[0])

x = jnp.array(np.random.normal(size=(4096, 7, 64)))

@jax.jit
@jax.grad
def f_gpu(x):
  x = jax.lax.with_sharding_constraint(x, gpu_sharding)
  return jnp.linalg.pinv(x).sum()
%time jax.block_until_ready(f_gpu(x))
%timeit jax.block_until_ready(f_gpu(x))

@jax.jit
@jax.grad
def f_cpu(x):
  x = jax.lax.with_sharding_constraint(x, cpu_sharding)
  return jnp.linalg.pinv(x).sum()
%time jax.block_until_ready(f_cpu(x))
%timeit jax.block_until_ready(f_cpu(x))

CPU times: user 2.82 s, sys: 95.1 ms, total: 2.92 s
Wall time: 4.83 s
1.89 s ± 238 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

CPU times: user 500 ms, sys: 142 ms, total: 642 ms
Wall time: 613 ms
330 ms ± 88.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

4 replies

f0uriest Feb 16, 2024

Is there a trick to getting this to work with multiple devices inside a single outer jit?
something like

@jax.jit
@jax.jacfwd
def outer_function(x):
    y1 = func_that_should_use_cpu(x)
    y2 = func_that_should_use_gpu0(x)
    y3 = func_that_should_use_gpu1(x)
    return func_that_combines_them_all(y1, y2, y3)

If I try the sharding constraint on the inner functions and then jit the outer one I get errors about incompatible sharding

Elias-Buerger Sep 19, 2024

Have you managed to solve the issue? I've searched everywhere and found nothing.

yashk2810 Sep 19, 2024
Collaborator

You can remove the outer jit and then run every jit on different devices. But within a single jit, this is not possible.

Elias-Buerger Sep 19, 2024

good to know, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I run part of jit compiled function on a different device? #18789

{{title}}

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How can I run part of jit compiled function on a different device? #18789

n-gao Dec 3, 2023

Replies: 3 comments · 5 replies

n-gao Dec 3, 2023 Author

n-gao Dec 3, 2023 Author

f0uriest Dec 3, 2023

jakevdp Dec 4, 2023 Maintainer

f0uriest Feb 16, 2024

Elias-Buerger Sep 19, 2024

yashk2810 Sep 19, 2024 Collaborator

Elias-Buerger Sep 19, 2024

n-gao
Dec 3, 2023

Replies: 3 comments 5 replies

n-gao
Dec 3, 2023
Author

n-gao Dec 3, 2023
Author

f0uriest
Dec 3, 2023

jakevdp
Dec 4, 2023
Maintainer

yashk2810 Sep 19, 2024
Collaborator