Compile time of `train_step` very depending on batch size #3349

Jorgvt · 2023-09-18T13:27:50Z

Jorgvt
Sep 18, 2023

Hi!

When working on a current project I found that the compiling time of my train_step function increased drastically when I increased the batch size I was using. At first it was taking about 20s when using a batch size of 2 but increased up to 4000s when I bumped the batch size up to 64. Here is the train_step function I'm using (it may be a little confusing because I'm working on an IQA task with a custom model that has an state):

@jax.jit
def train_step(state, batch):
    """Train for a single step."""
    img_o, img_a, img_b, label = batch
    def loss_fn(params):
        ## Forward pass through the model
        img_o_pred, updated_state = state.apply_fn({"params": params, **state.state}, img_o, mutable=list(state.state.keys()), train=True)
        img_a_pred, updated_state = state.apply_fn({"params": params, **state.state}, img_a, mutable=list(state.state.keys()), train=True)
        img_b_pred, updated_state = state.apply_fn({"params": params, **state.state}, img_b, mutable=list(state.state.keys()), train=True)

        ## Calculate the distances
        dist_oa = (img_o_pred - img_a_pred)**2
        dist_oa = jnp.clip((dist_oa).sum(axis=(1,2,3)), a_min=1e-8)**(1/2)
        dist_ob = (img_o_pred - img_b_pred)**2
        dist_ob = jnp.clip((dist_ob).sum(axis=(1,2,3)), a_min=1e-8)**(1/2)
        dist_diff = dist_oa - dist_ob
        
        ## Calculate binary crossentropy
        return optax.sigmoid_binary_cross_entropy(dist_diff, label).mean(), (updated_state, dist_diff)
    
    (loss, (updated_state, dist_diff)), grads = jax.value_and_grad(loss_fn, has_aux=True)(state.params)

    state = state.apply_gradients(grads=grads)
    metrics_updates = state.metrics.single_from_model_output(loss=loss, logits=dist_diff[:,None], labels=jnp.round(label).astype(int), axis_name="num_devices")
    metrics = state.metrics.merge(metrics_updates)
    state = state.replace(metrics=metrics)
    state = state.replace(state=updated_state)
    return state

As a note, I have a different function to calculate the metrics during validation, and this function isn't showing the same behavior so I thought that it may have been related to the calculation of the gradient, but I don't really know if it makes sense.

I was under the assumption that changing the batch size shouldn't have this big of an influence in compilation and, as I couldn't narrow down the problem, I tried to replicate it in a very simple MNIST classifier example in Colab (here).

What I found was basically the same, as the compilation time goes up with the batch size as you can see in this quick wandb dashboard I set up for the experiment: https://wandb.ai/jorgvt/JaX_Compile?workspace=user-jorgvt

I'd be more than willing to share more information with anyone that can shed some light!

chiamp · 2023-09-28T01:24:08Z

chiamp
Sep 28, 2023
Collaborator

Just double checking, you're not changing the batch size within the same training loop right?

2 replies

Jorgvt Oct 2, 2023
Author

Yea, I'm keeping the same batch size at all times. Just as clarification, in the original post I was talking about compilation times of the train_step function, I wasn't considering a full training loop yet. That's why it is so intriguing to me.

chiamp Jan 31, 2024
Collaborator

Sorry for the delay. I'm not too sure why you're seeing this behavior. Changing the batch size each loop would cause recompilation each time, but you're talking about purely just the compilation time, so I'm not sure. Maybe try asking in the JAX discussions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile time of `train_step` very depending on batch size #3349

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Compile time of train_step very depending on batch size #3349

Jorgvt Sep 18, 2023

Replies: 1 comment · 2 replies

chiamp Sep 28, 2023 Collaborator

Jorgvt Oct 2, 2023 Author

chiamp Jan 31, 2024 Collaborator

Compile time of `train_step` very depending on batch size #3349

Jorgvt
Sep 18, 2023

Replies: 1 comment 2 replies

chiamp
Sep 28, 2023
Collaborator

Jorgvt Oct 2, 2023
Author

chiamp Jan 31, 2024
Collaborator