Skip to content

Conversation

shanjiaz
Copy link
Collaborator

@shanjiaz shanjiaz commented Oct 6, 2025

We were loading the full verifier model only to get the embeddings, specific layers. This util helps saving memory by loading only the shards we needed and returns a dict of layer name -> layers.

Tested locally:

model_path = "shanjiaz/Meta-Llama-3-8B-Instruct-FP8-BLOCK"
layer_names = ["lm_head.weight", "model.embed_tokens.weight"]
layer = load_model_layers(layer_names, model_path)
for k, v in layer.items():
    print(k, v.shape)
    print(f"sample data: {v.flatten()[:5]}")
model_path = "/home/hzhao/.cache/huggingface/hub/models--shanjiaz--Meta-Llama-3-8B-Instruct-FP8-BLOCK/snapshots/ea6d7c1a6a0874d9db6511ce93da2b777f24376f"
layer_names = ["lm_head.weight", "model.embed_tokens.weight"]
layer = load_model_layers(layer_names, model_path)
for k, v in layer.items():
    print(k, v.shape)
    print(f"sample data: {v.flatten()[:5]}")
2025-10-06 14:02:21.042 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
2025-10-06 14:02:21.043 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
2025-10-06 14:02:21.044 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
lm_head.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0098,  0.0175,  0.0037,  0.0222, -0.0194], dtype=torch.bfloat16)
model.embed_tokens.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0013,  0.0054, -0.0022,  0.0003, -0.0024], dtype=torch.bfloat16)
2025-10-06 14:02:52.385 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
2025-10-06 14:02:52.468 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
2025-10-06 14:02:52.508 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
lm_head.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0098,  0.0175,  0.0037,  0.0222, -0.0194], dtype=torch.bfloat16)
model.embed_tokens.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0013,  0.0054, -0.0022,  0.0003, -0.0024], dtype=torch.bfloat16)

Copy link

github-actions bot commented Oct 6, 2025

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/18291068632/artifacts/4195955811.
They will be retained for up to 30 days.
Commit: e251fc9

Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
@shanjiaz shanjiaz marked this pull request as ready for review October 6, 2025 18:53
@shanjiaz shanjiaz requested review from fynnsu and rahul-tuli October 9, 2025 16:28
Copy link
Collaborator

@fynnsu fynnsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Added a few suggestions

for name in layer_names:
shard = weight_map.get(name)
if shard is None:
logger.warning(f"Tensor '{name}' not found in index weight_map.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to make this an error.

available = set(f.keys())
for name in names:
if name not in available:
logger.warning(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

raise FileNotFoundError(f"Expected local file missing: {p}")
return p
# Treat as repo_id on the Hub
logger.info("Loading from huggingface directory: {}", model_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (or the other logger.info call) is going to print out each time this function is called. If we load multiple values (i.e. embed_tokens.weight and lm_head.weight) it would be nice to also include the file_name in these log calls.

@fynnsu fynnsu mentioned this pull request Oct 17, 2025
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants