added loading util for specific layers #144

shanjiaz · 2025-10-06T18:17:24Z

We were loading the full verifier model only to get the embeddings, specific layers. This util helps saving memory by loading only the shards we needed and returns a dict of layer name -> layers.

Tested locally:

model_path = "shanjiaz/Meta-Llama-3-8B-Instruct-FP8-BLOCK"
layer_names = ["lm_head.weight", "model.embed_tokens.weight"]
layer = load_model_layers(layer_names, model_path)
for k, v in layer.items():
    print(k, v.shape)
    print(f"sample data: {v.flatten()[:5]}")

model_path = "/home/hzhao/.cache/huggingface/hub/models--shanjiaz--Meta-Llama-3-8B-Instruct-FP8-BLOCK/snapshots/ea6d7c1a6a0874d9db6511ce93da2b777f24376f"
layer_names = ["lm_head.weight", "model.embed_tokens.weight"]
layer = load_model_layers(layer_names, model_path)
for k, v in layer.items():
    print(k, v.shape)
    print(f"sample data: {v.flatten()[:5]}")

2025-10-06 14:02:21.042 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
2025-10-06 14:02:21.043 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
2025-10-06 14:02:21.044 | INFO     | __main__:_resolve_file:64 - Loading from local directory:
lm_head.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0098,  0.0175,  0.0037,  0.0222, -0.0194], dtype=torch.bfloat16)
model.embed_tokens.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0013,  0.0054, -0.0022,  0.0003, -0.0024], dtype=torch.bfloat16)
2025-10-06 14:02:52.385 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
2025-10-06 14:02:52.468 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
2025-10-06 14:02:52.508 | INFO     | __main__:_resolve_file:70 - Loading from huggingface directory:
lm_head.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0098,  0.0175,  0.0037,  0.0222, -0.0194], dtype=torch.bfloat16)
model.embed_tokens.weight torch.Size([128256, 4096])
sample data: tensor([ 0.0013,  0.0054, -0.0022,  0.0003, -0.0024], dtype=torch.bfloat16)

Signed-off-by: shanjiaz <[email protected]>

github-actions · 2025-10-06T18:19:56Z

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/18291068632/artifacts/4195955811.
They will be retained for up to 30 days.
Commit: e251fc9

Signed-off-by: shanjiaz <[email protected]>

fynnsu

Looks good!

Added a few suggestions

fynnsu · 2025-10-17T15:50:47Z

src/speculators/utils/loading.py

+    for name in layer_names:
+        shard = weight_map.get(name)
+        if shard is None:
+            logger.warning(f"Tensor '{name}' not found in index weight_map.")


It might be better to make this an error.

fynnsu · 2025-10-17T15:51:01Z

src/speculators/utils/loading.py

+            available = set(f.keys())
+            for name in names:
+                if name not in available:
+                    logger.warning(


fynnsu · 2025-10-17T15:57:49Z

src/speculators/utils/loading.py

+            raise FileNotFoundError(f"Expected local file missing: {p}")
+        return p
+    # Treat as repo_id on the Hub
+    logger.info("Loading from huggingface directory: {}", model_path)


This (or the other logger.info call) is going to print out each time this function is called. If we load multiple values (i.e. embed_tokens.weight and lm_head.weight) it would be nice to also include the file_name in these log calls.

added loading util for specific layers

4ef4362

Signed-off-by: shanjiaz <[email protected]>

shanjiaz added 4 commits October 6, 2025 14:34

quality

47b921b

Signed-off-by: shanjiaz <[email protected]>

end of file

405b9b1

Signed-off-by: shanjiaz <[email protected]>

type

9affbbc

Signed-off-by: shanjiaz <[email protected]>

type

e251fc9

Signed-off-by: shanjiaz <[email protected]>

shanjiaz marked this pull request as ready for review October 6, 2025 18:53

shanjiaz requested review from fynnsu and rahul-tuli October 9, 2025 16:28

fynnsu approved these changes Oct 17, 2025

View reviewed changes

fynnsu mentioned this pull request Oct 17, 2025

[WIP] Eagle3 Training Implementation #143

Draft

26 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added loading util for specific layers #144

added loading util for specific layers #144

Uh oh!

shanjiaz commented Oct 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

fynnsu left a comment

Uh oh!

fynnsu Oct 17, 2025

Uh oh!

fynnsu Oct 17, 2025

Uh oh!

fynnsu Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

added loading util for specific layers #144

Are you sure you want to change the base?

added loading util for specific layers #144

Uh oh!

Conversation

shanjiaz commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fynnsu left a comment

Choose a reason for hiding this comment

Uh oh!

fynnsu Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

fynnsu Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

fynnsu Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shanjiaz commented Oct 6, 2025 •

edited

Loading

github-actions bot commented Oct 6, 2025 •

edited

Loading