Error when loading sliced llama3.1-70b-Instruct

When I try to load sliced model lama3.1-70b-Instruct I got the following error. 
```
lib/python3.10/site-packages/torch/nn/modules/module.py", line 2581, in load_state_dict
    raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for UninitializedLlamaForCausalLM:
        Unexpected key(s) in state_dict: "model.layers.32.mlp_shortcut_Q", "model.layers.32.attn_shortcut_Q", "model.layers.32.self_attn.q_proj.weight", "model.layers.32.self_attn.k_proj.weight", "model.layers.32.self_attn.v_proj.weight", "model.layers.32.self_attn.o_proj.weight", "model.layers.32.mlp.gate_proj.weight", "model.layers.32.mlp.up_proj.weight", "model.layers.32.mlp.down_proj.weight", "model.layers.33.mlp_shortcut_Q", "model.layers.33.attn_shortcut_Q", "model.layers.33.self_attn.q_proj.weight", "model.layers.33.self_attn.k_proj.weight", "model.layers.33.self_attn.v_proj.weight", "model.layers.33.self_attn.o_proj.weight", "model.layers.33.mlp.gate_proj.weight", "model.layers.33.mlp.up_proj.weight", "model.layers.33.mlp.down_proj.weight", "model.layers.34.mlp_shortcut_Q", "model.layers.34.attn_shortcut_Q", "model.layers.34.self_attn.q_proj.weight", "model.layers.34.self_attn.k_proj.weight", "model.layers.34.self_attn.v_proj.weight",
```
and so on and later
```
 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([128256, 6144]) from checkpoint, the shape in current model is torch.Size([32000, 6144]).
        size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 6144]) from checkpoint, the shape in current model is torch.Size([8192, 6144]).
        size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 6144]) from checkpoint, the shape in current model is torch.Size([8192, 6144]).
        size mismatch for model.layers.0.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
        size mismatch for model.layers.0.mlp.up_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
        size mismatch for model.layers.0.mlp.down_proj.weight: copying a param with shape torch.Size([6144, 28672]) from checkpoint, the shape in current model is torch.Size([6144, 11008]).
...
ize mismatch for model.layers.31.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
        size mismatch for model.layers.31.mlp.up_proj.weight: copying a param with shape torch.Size([28672, 6144]) from checkpoint, the shape in current model is torch.Size([11008, 6144]).
        size mismatch for model.layers.31.mlp.down_proj.weight: copying a param with shape torch.Size([6144, 28672]) from checkpoint, the shape in current model is torch.Size([6144, 11008]).
        size mismatch for lm_head.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
```
I sliced model by this way
```
python run_slicegpt.py \                                
    --model meta-llama/Llama-3.1-70B-Instruct \
    --save-dir results \
    --sparsity 0.25 \
    --device cuda \
    --eval-baseline \
    --distribute-model \
    --no-wandb
```
And now I try to load_sliced_model. 

How to fix this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when loading sliced llama3.1-70b-Instruct #185

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Error when loading sliced llama3.1-70b-Instruct #185

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions