Skip to content

Error when loading sliced llama3.1-70b-Instruct #185

@Daisy-a-p

Description

@Daisy-a-p

When I try to load sliced model lama3.1-70b-Instruct I got the following error.

lib/python3.10/site-packages/torch/nn/modules/module.py", line 2581, in load_state_dict
    raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for UninitializedLlamaForCausalLM:
        Unexpected key(s) in state_dict: "model.layers.32.mlp_shortcut_Q", "model.layers.32.attn_shortcut_Q", "model.layers.32.self_attn.q_proj.weight", "model.layers.32.self_attn.k_proj.weight", "model.layers.32.self_attn.v_proj.weight", "model.layers.32.self_attn.o_proj.weight", "model.layers.32.mlp.gate_proj.weight", "model.layers.32.mlp.up_proj.weight", "model.layers.32.mlp.down_proj.weight", "model.layers.33.mlp_shortcut_Q", "model.layers.33.attn_shortcut_Q", "model.layers.33.self_attn.q_proj.weight", "model.layers.33.self_attn.k_proj.weight", "model.layers.33.self_attn.v_proj.weight", "model.layers.33.self_attn.o_proj.weight", "model.layers.33.mlp.gate_proj.weight", "model.layers.33.mlp.up_proj.weight", "model.layers.33.mlp.down_proj.weight", "model.layers.34.mlp_shortcut_Q", "model.layers.34.attn_shortcut_Q", "model.layers.34.self_attn.q_proj.weight", "model.layers.34.self_attn.k_proj.weight", "model.layers.34.self_attn.v_proj.weight",

and so on and later

 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([128256, 6144]) from checkpoint, the shape in current model is torch.Size([32000, 6144]).
        size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 6144]) from checkpoint, the shape in current model is torch.Size([8192, 6144]).
        size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 6144]) from checkpoint, the shape in current model is torch.Size([8192, 6144]).
        size mismatch for model.layers.0.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
        size mismatch for model.layers.0.mlp.up_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
        size mismatch for model.layers.0.mlp.down_proj.weight: copying a param with shape torch.Size([6144, 28672]) from checkpoint, the shape in current model is torch.Size([6144, 11008]).
...
ize mismatch for model.layers.31.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
        size mismatch for model.layers.31.mlp.up_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
        size mismatch for model.layers.31.mlp.down_proj.weight: copying a param with shape torch.Size([6144, 28672]) from checkpoint, the shape in current model is torch.Size([6144, 11008]).
        size mismatch for lm_head.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).

I sliced model by this way

python run_slicegpt.py \                                
    --model meta-llama/Llama-3.1-70B-Instruct \
    --save-dir results \
    --sparsity 0.25 \
    --device cuda \
    --eval-baseline \
    --distribute-model \
    --no-wandb

And now I try to load_sliced_model.

How to fix this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions