feat(mimo_v25): support MiMo-V2.5-Pro by Simar-malhotra09 · Pull Request #2514 · NVIDIA-NeMo/Automodel

Simar-malhotra09 · 2026-06-10T20:43:54Z

What does this PR do ?

Adds NeMo AutoModel support for XiaomiMiMo/MiMo-V2.5-Pro, closing #2462.

Changelog

nemo_automodel/components/models/mimo_v25/config.py — MiMoV2Config adapted from the HF source; handles fused QKV (attention_projection_layout: fused_qkv), hybrid attention pattern, partial RoPE, sigmoid/noaux_tc MoE routing, and torch_dtype assignment after super().__init__() to avoid PretrainedConfig overwriting it
nemo_automodel/components/models/mimo_v25/model.py — full model implementation (MiMoV2Attention, MiMoV2RotaryEmbedding, MiMoV2DecoderLayer, MiMoV2Model, MiMoV2ForCausalLM) using NeMo infrastructure (initialize_linear_module, initialize_rms_norm_module, MoE, HFCheckpointingMixin, MoEFSDPSyncMixin); supports both full and sliding-window attention layers and EP via ModelCapabilities
nemo_automodel/components/models/mimo_v25/state_dict_adapter.py — MiMoV2StateDictAdapter handling FP8 dequantisation (weight_scale_inv pairs) and per-expert weight merging/splitting via MoeSplitExpertsStateDictMixin; fused QKV keys pass through unchanged
nemo_automodel/_transformers/registry.py — registers MiMoV2ForCausalLM in MODEL_ARCH_MAPPING and mimo_v2 in _CUSTOM_CONFIG_REGISTRATIONS
pyproject.toml — adds per-file D101/D103 ruff ignores matching the project baseline pattern

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

This is my first contribution to the repo so I may have missed some patterns or conventions. Happy to make changes as needed.

Additional Information

Related to Support MiMo-V2.5-Pro #2462
Smoke-tested on CPU with a tiny config (4 layers, fused QKV, mixed full/SWA attention, MoE layers): imports, registry lookup, model instantiation, adapter attachment, and forward pass all pass
Example YAML recipe and documentation updates not yet included and can be added once the implementation is approved

copy-pr-bot · 2026-06-10T20:44:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

HuiyingLi · 2026-06-11T00:49:27Z

@Simar-malhotra09 Thank you! Could you please attach the wandb/training loss of the model on hellaswag dataset.

Simar-malhotra09 · 2026-06-11T01:15:21Z

@HuiyingLi I actually don't have access to GPUs to run the training. Is there something you can do on your end? I added the YAML file in the latest commit; mostly following the one for mimo flash so it should be good.

akoumpa · 2026-06-15T16:53:42Z

/ok to test 6414e2f

Simar-malhotra09 · 2026-06-16T12:50:47Z

@akoumpa I added the coverage docs for the model since that was the main failing test I saw. Should be good to run again. Although the tests specific to this model still need to be written in tests/unit_tests/models/ but idk if that is in scope atm

…to match expected arch Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

…V2.5-Pro Implements MiMoV2StateDictAdapter (MoeSplitExpertsStateDictMixin + StateDictAdapter) for XiaomiMiMo/MiMo-V2.5-Pro: - from_hf: FP8 dequantisation (weight + _scale_inv pairs) followed by per-expert weight merging via _from_hf_w_merged_experts; fused QKV keys (self_attn.qkv_proj.weight) pass through unchanged since HF and NeMo use the same name - to_hf: splits merged expert tensors back to per-expert layout and re-quantises eligible weights to float8_e4m3fn with scale_inv companions; NON_QUANTIZED_KEY_PATTERNS matches the V2-Flash precedent (norms, embeddings, lm_head, router gate, o_proj, attention_sink_bias) - Registers MiMoV2ForCausalLM in MODEL_ARCH_MAPPING and mimo_v2 in _CUSTOM_CONFIG_REGISTRATIONS so NeMoAutoModelForCausalLM can resolve the model from an HF config Smoke-tested end-to-end on CPU with a tiny MiMo-V2.5-Pro config (4 layers, fused QKV, mixed full/SWA attention, MoE layers): imports, registry lookup, model instantiation, adapter attachment, and a forward pass all pass cleanly. Signed-off-by: Simar <malhotrasimar009@gmail.com> Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Adds examples/llm_finetune/mimo_v25/mimo_v25_pro_hellaswag.yaml: - 16-node (128 H100) recipe using pp_size=4, ep_size=32 matching the declared ModelCapabilities (supports_pp=True, supports_ep=True) - dequantize_base_checkpoint=true to handle the FP8 base checkpoint via MiMoV2StateDictAdapter before training - Same hyperparameters (lr=1e-5, AdamW, max_steps=100) and dataset splits as the MiMo-V2-Flash hellaswag recipe Signed-off-by: Simar <malhotrasimar009@gmail.com> Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Adds docs/model-coverage/llm/xiaomimimo/mimo-v2-5-pro.mdx so that test_every_registered_arch_has_model_coverage_doc passes for the newly registered MiMoV2ForCausalLM architecture. Signed-off-by: Simar <malhotrasimar009@gmail.com> Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

- Adds MiMo-V2.5-Pro page entry to docs/fern/versions/nightly.yml so the page appears in the sidebar alongside MiMo-V2-Flash - Adds missing Parameters row to the Info table to match the MiMo-V2-Flash page format Signed-off-by: Simar <malhotrasimar009@gmail.com> Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

jgerh

Completed tech pubs review of docs/model-coverage/llm/xiaomimimo/mimo-v2-5-pro.mdx. Looks great; added two minor copyedits.

Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>

akoumpa · 2026-06-16T22:46:50Z

/ok to test 6beb16e

github-actions · 2026-06-16T22:48:17Z

🌿 Preview your docs: https://nvidia-preview-main.docs.buildwithfern.com/nemo/automodel

github-actions Bot added the community-request label Jun 10, 2026

Simar-malhotra09 marked this pull request as ready for review June 10, 2026 20:46

Simar-malhotra09 requested review from a team as code owners June 10, 2026 20:46

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label Jun 13, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci June 15, 2026 16:54 Inactive

copy-pr-bot Bot temporarily deployed to test June 15, 2026 16:54 Inactive

copy-pr-bot Bot temporarily deployed to public June 15, 2026 16:54 Inactive

copy-pr-bot Bot temporarily deployed to public June 15, 2026 16:57 Inactive

svcnvidia-nemo-ci removed the waiting-on-maintainers Waiting on maintainers to respond label Jun 15, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci June 15, 2026 17:47 Inactive

Simar-malhotra09 requested review from HuiyingLi, akoumpa, athitten, jgerh and snowmanwwg as code owners June 16, 2026 12:38

feat(support mimo v2.5)- download the config file from hf and modify …

bc8ad5f

…to match expected arch Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Simar-malhotra09 added 6 commits June 16, 2026 08:54

feat(mimo v2.5)- remove native imports and use NeMo instead

37825fd

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

feat(mimo_v25): add NeMo model and config for MiMo-V2.5-Pro

7e77c0d

Signed-off-by: Simar Malhotra <malhotrasimar009@gmail.com>

Simar-malhotra09 force-pushed the main branch from 8dd0883 to f74ad26 Compare June 16, 2026 12:55

jgerh reviewed Jun 16, 2026

View reviewed changes

Comment thread docs/model-coverage/llm/xiaomimimo/mimo-v2-5-pro.mdx Outdated

Comment thread docs/model-coverage/llm/xiaomimimo/mimo-v2-5-pro.mdx Outdated

svcnvidia-nemo-ci added the waiting-on-customer Waiting on the original author to respond label Jun 16, 2026

akoumpa and others added 2 commits June 16, 2026 15:46

Apply suggestions from code review

abe8433

Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>

Merge branch 'main' into main

6beb16e

copy-pr-bot Bot temporarily deployed to test June 16, 2026 22:47 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 16, 2026 22:47 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 22:47 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 22:49 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 22:50 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 16, 2026 22:51 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mimo_v25): support MiMo-V2.5-Pro#2514

feat(mimo_v25): support MiMo-V2.5-Pro#2514
Simar-malhotra09 wants to merge 9 commits into
NVIDIA-NeMo:mainfrom
Simar-malhotra09:main

Simar-malhotra09 commented Jun 10, 2026

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

HuiyingLi commented Jun 11, 2026

Uh oh!

Simar-malhotra09 commented Jun 11, 2026

Uh oh!

akoumpa commented Jun 15, 2026

Uh oh!

Simar-malhotra09 commented Jun 16, 2026

Uh oh!

jgerh left a comment

Uh oh!

Uh oh!

Uh oh!

akoumpa commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Simar-malhotra09 commented Jun 10, 2026

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

HuiyingLi commented Jun 11, 2026

Uh oh!

Simar-malhotra09 commented Jun 11, 2026

Uh oh!

akoumpa commented Jun 15, 2026

Uh oh!

Simar-malhotra09 commented Jun 16, 2026

Uh oh!

jgerh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

akoumpa commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants