state_dict_factory: llama checkpoint - support SWIGLU #5601

nelyahu · 2024-06-02T18:04:20Z

DeepSpeed supports loading a checkpoint for inference with different DP/TP/PP. This requires to split/merge parameters based on their TP attributes. Currently, this is done by using model specific parameter names. This is not a good practice and should be modified.

This commit handles the required changes to support MDS LLaMA model. There are 2 changes:

Support for lm_head.weight
Support for mlp.h_to_4h.weight for SWIGLU

SWIGLU requires different handling, however there is no meta data available that identifies mlp.h_to_4h.weight as SWIGLU. Therefore, for now we use a hack to detect it.

DeepSpeed supports loading a checkpoint for inference with different DP/TP/PP. This requires to split/merge parameters based on their TP attributes. Currently, this is done by using model specific parameter names. This is not a good practice and should be modified. This commit handles the required changes to support MDS LLaMA model. There are 2 changes: - Support for lm_head.weight - Support for mlp.h_to_4h.weight for SWIGLU SWIGLU requires different handling, however there is no meta data available that identifies mlp.h_to_4h.weight as SWIGLU. Therefore, for now we use a hack to detect it.

tjruwase · 2024-06-21T22:29:52Z

deepspeed/runtime/state_dict_factory.py

                new_client_sd[key] = torch.cat(value_list, axis=0)
+            elif "mlp.dense_h_to_4h.weight" in key:


@nelyahu, this logic is very old and hacky. Could it not be replaced with Universal Checkpointing?

tohtana

As Tunji commented, we now have a more flexible solution introduced in #5390. Can you it? The examples in #5390 might help.

tohtana · 2024-09-06T22:04:28Z

@nelyahu Do we have any update?

nelyahu requested review from mrwyattii and tjruwase as code owners June 2, 2024 18:04

Merge branch 'master' into fix_llama_3d_checkpoint_load

6703d8f

tjruwase requested review from samadejacobs and tohtana and removed request for mrwyattii June 21, 2024 22:27

tjruwase reviewed Jun 21, 2024

View reviewed changes

tohtana reviewed Jun 21, 2024

View reviewed changes

Merge branch 'master' into fix_llama_3d_checkpoint_load

b981d43

tohtana self-assigned this Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

state_dict_factory: llama checkpoint - support SWIGLU #5601

state_dict_factory: llama checkpoint - support SWIGLU #5601

nelyahu commented Jun 2, 2024

tjruwase Jun 21, 2024

tohtana left a comment •

edited

Loading

tohtana commented Sep 6, 2024

		new_client_sd[key] = torch.cat(value_list, axis=0)
		elif "mlp.dense_h_to_4h.weight" in key:

state_dict_factory: llama checkpoint - support SWIGLU #5601

Are you sure you want to change the base?

state_dict_factory: llama checkpoint - support SWIGLU #5601

Conversation

nelyahu commented Jun 2, 2024

tjruwase Jun 21, 2024

Choose a reason for hiding this comment

tohtana left a comment • edited Loading

Choose a reason for hiding this comment

tohtana commented Sep 6, 2024

tohtana left a comment •

edited

Loading