Support DeepSeek V4 Flash 4Expert (top-4) by yuhai-china · Pull Request #474 · antirez/ds4

yuhai-china · 2026-06-28T22:45:26Z

This PR enables ds4 to run the 4Expert variant of DeepSeek V4 Flash, which routes to top-4 experts instead of top-6.

Motivation

DeepSeek V4 Flash comes in variants that activate different numbers of routed experts per token. The original ds4 hardcodes 6 active experts. With this PR, the 4Expert variant (256 total experts, 4 active per token) is supported out of the box, while 6-expert models remain fully backward compatible.

4Expert safetensors: https://huggingface.co/cloudyu/DeepSeek-V4-Flash-4Expert
4Expert Q4_K GGUF: https://huggingface.co/cloudyu/DeepSeek-V4-Flash-4Expert-GGUF

Changes

ds4.c — 4Expert support with backward compatibility

DS4_SHAPE_FLASH.n_expert_used changed from 6 to 4
g_ds4_shape.n_expert_used changed from 6 to 4
ds4_select_shape_from_metadata(): when matching Flash variant, accepts both 4 and 6, preserving 6 at runtime for old GGUFs

gguf-tools/gen_gguf_template.py — GGUF template generator

Generates GGUF metadata templates from safetensors index, mapping HF tensor names to GGUF names via the same layer_map as deepseek4-quantize.c. Handles I64→I32 conversion for tid2eid routing table.

test-4expert.sh — One-click end-to-end test

Single script that clones, builds, downloads weights, generates template, quantizes, and runs inference. Anyone can verify the PR on a fresh machine with one command.

docs/

gguf-conversion.md — step-by-step GGUF conversion guide
test-pr-on-linux.md — Linux testing quickstart

Testing

Q4_K GGUF (~153 GiB) converted from 4Expert safetensors, generates at ~26.70 t/s
Existing 6-expert GGUF files continue to work (shape auto-detection)
All changes compile cleanly with make cpu (Linux) and make (macOS)

Quick Test

git clone https://github.com/yuhai-china/ds4 && cd ds4 && git checkout 4expert
bash test-4expert.sh

- Change default n_expert_used from 6 to 4 in DS4_SHAPE_FLASH - Add backward compatibility: auto-detect 6-expert Flash models and set n_expert_used accordingly - Add gen_gguf_template.py: generate GGUF template from safetensors metadata for the quantizer pipeline - Add docs/gguf-conversion.md: step-by-step GGUF conversion guide Model: https://huggingface.co/cloudyu/DeepSeek-V4-Flash-4Expert

- Add n_expert_used parameter to router_select_kernel, router_select_parallel_kernel, and router_select_warp_topk_kernel - Replace all hardcoded 6/6u expert count with n_expert_used - Update guard checks to accept both n_expert_used=4 and n_expert_used=6 - Fix buffer size and hash byte calculations to use n_expert_used

yuhai-china force-pushed the 4expert branch 3 times, most recently from 129a9f8 to 4a0380c Compare June 28, 2026 23:25

yuhai-china force-pushed the 4expert branch from 4a0380c to 5519c01 Compare June 28, 2026 23:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support DeepSeek V4 Flash 4Expert (top-4)#474

Support DeepSeek V4 Flash 4Expert (top-4)#474
yuhai-china wants to merge 2 commits into
antirez:mainfrom
yuhai-china:4expert

yuhai-china commented Jun 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuhai-china commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

ds4.c — 4Expert support with backward compatibility

gguf-tools/gen_gguf_template.py — GGUF template generator

test-4expert.sh — One-click end-to-end test

docs/

Testing

Quick Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuhai-china commented Jun 28, 2026 •

edited

Loading