Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pruned model does not match target structred config #74

Open
dat-browny opened this issue Sep 21, 2024 · 0 comments
Open

The pruned model does not match target structred config #74

dat-browny opened this issue Sep 21, 2024 · 0 comments

Comments

@dat-browny
Copy link

I had used your sample scripts to pruned princeton-nlp/Sheared-LLaMA-2.7B into 1.3B size. This is my model config training

data_local: data_tmp/
data_remote: # If blank, files must be present in data_local
tokenizer_name: princeton-nlp/Sheared-LLaMA-2.7B
max_seq_len: 4096
global_seed: 17

# Run Name
run_name: Sheared-LLaMA-2.7B-Pruning

model:
  name: mosaic_llama2_1.3b
  path: models/Sheared-LLaMA-2.7B-composer/state_dict.pt
  init_device: "cpu" 
  tokenizer_name: ${tokenizer_name}
  d_model: 2560
  n_heads: 20
  n_layers: 32
  intermediate_size: 6912
  max_seq_len: ${max_seq_len}
  vocab_size: 32000
  init_std: 0.02
  attn_pdrop: 0.0
  resid_pdrop: 0.0
  emb_pdrop: 0.0
  attn_impl: flash
  rms_norm_eps: 1e-5
  l0_module: 
    start_sparsity: 0.0
    target_sparsity: 0.5
    pruning_modules: ["head", "head_layer", "mlp", "intermediate"]
    lagrangian_warmup_steps: 5ba 
    target_model:
      d_model: 2048
      n_layers: 24
      n_heads: 16 
      intermediate_size: 5504 
      vocab_size: 32000

# Tokenizer
tokenizer:
  type: hftokenizer
  args:
    tokenizer_name: ${tokenizer_name}
    max_seq_len: ${max_seq_len}

# Dataloaders
train_loader:
  name: text
  dataset:
    local: ${data_local}
    remote: ${data_remote}
    split: github
    shuffle: true
    tokenizer_name: ${tokenizer_name}
    max_seq_len: ${max_seq_len}
    shuffle_seed: ${global_seed}
    is_uint16: true
  drop_last: true
  num_workers: 8

eval_loader:
  name: text
  dataset:
    local: ${data_local}
    remote: ${data_remote}
    split: eval_merge 
    shuffle: false 
    tokenizer_name: ${tokenizer_name}
    max_seq_len: ${max_seq_len}
    shuffle_seed: ${global_seed}
    is_uint16: true
  drop_last: false
  num_workers: 8

# Optimization
scheduler:
  name: cosine_with_warmup
  t_warmup: 100ba
  alpha_f: 0.1

optimizer:
  name: decoupled_adamw
  lr: 1e-4
  betas:
  - 0.9
  - 0.95
  eps: 1.0e-08
  weight_decay: 0.0
  lag_lr: 1.0

algorithms:
  gradient_clipping:
    clipping_type: norm
    clipping_threshold: 1.0

max_duration: 800ba  
eval_interval: 200ba
eval_subset_num_batches: 100
global_train_batch_size: 8

# System
seed: ${global_seed}
device_eval_batch_size: 8
device_train_microbatch_size: 4
precision: amp_bf16

# FSDP
fsdp_config:
  sharding_strategy: FULL_SHARD
  mixed_precision: DEFAULT
  activation_checkpointing: true
  activation_cpu_offload: false
  verbose: false

# Logging
progress_bar: false
log_to_console: true
console_log_interval: 1ba

callbacks:
  speed_monitor:
    window_size: 10
  memory_monitor: {}
  lr_monitor: {}
  data_loading:
    dynamic: false
    update_type: doremi
    proportion: [0.67,0.045,0.045,0.02,0.045,0.025,0.15]
    set_names: [cc,github,book,stackexchange,wiki,arxiv,c4-rp]
    target_loss: [1.8712,0.6883,2.0325,1.5353,1.6297,1.3560,2.0328]


loggers:
  wandb: 
    project: LLM-Prune
    entity: 
    name: ${run_name}
    init_kwargs:
      mode: online
      dir: wandb_dir

# Checkpoint to local filesystem or remote object store
save_interval: 100ba 
save_folder: save_dir 
autoresume: false
python_log_level: DEBUG
save_overwrite: true

Note that i just use github proportion to check the pipeline is completed or not. After pruning, I convert model by this scripts

MODEL_PATH=save_dir/latest-rank0.pt
python3 -m llmshearing.utils.post_pruning_processing prune_and_save_model $MODEL_PATH

MODEL_PATH=save_dir/pruned-latest-rank0.pt
OUTPUT_PATH=save_dir/hf-latest_rank0
MODEL_CLASS=LlamaForCausalLM
HIDDEN_SIZE=2048
NUM_ATTENTION_HEADS=16
NUM_HIDDEN_LAYERS=24
INTERMEDIATE_SIZE=5504
MODEL_NAME=Sheared-Llama-1.3B

python3 -m llmshearing.utils.composer_to_hf save_composer_to_hf $MODEL_PATH $OUTPUT_PATH \
        model_class=${MODEL_CLASS} \
        hidden_size=${HIDDEN_SIZE} \
        num_attention_heads=${NUM_ATTENTION_HEADS} \
        num_hidden_layers=${NUM_HIDDEN_LAYERS} \
        intermediate_size=${INTERMEDIATE_SIZE} \
        num_key_value_heads=${NUM_ATTENTION_HEADS} \
        _name_or_path=${MODEL_NAME}

But some how the output shape after pruning had different size compare to the target:

### The attention Q, K, V shape mismatch
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32000, 2560]) from checkpoint, the shape in
current model is torch.Size([32000, 2048]).
        size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([2048, 2560]) from checkpoint, 
the shape in current model is torch.Size([2048, 2048]).
        size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2560]) from checkpoint, 
the shape in current model is torch.Size([2048, 2048]).
....

Is there any problem in prune_params() methods when apply l0_module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant