Skip to content

feat: add MmShapeBench — 50 shapes x 2 dtypes = 100 matmul tests#51

Merged
factnn merged 13 commits into
flagos-ai:mainfrom
factnn:feature/mm-shape-bench-clean
Jun 30, 2026
Merged

feat: add MmShapeBench — 50 shapes x 2 dtypes = 100 matmul tests#51
factnn merged 13 commits into
flagos-ai:mainfrom
factnn:feature/mm-shape-bench-clean

Conversation

@factnn

@factnn factnn commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • 100 test files: 50 real HF model training trace shapes x 2 dtypes (f16/f32)
  • Each test pins a specific (M,K,N) combination for aten::mm
  • Registration via kernel_list.py and dataset/__init__.py
  • Clean subset from feature/mm-shape-bench, no unrelated changes

Files changed

  • src/flagbench/accuracy/mm_shape_bench/ — 100 tests + __init__.py
  • src/flagbench/dataset/__init__.py — add MM_SHAPE exports
  • src/flagbench/dataset/kernel_list.py — add MM_SHAPE_OPERATORS dict

@factnn factnn force-pushed the feature/mm-shape-bench-clean branch 3 times, most recently from 934936e to 65facc8 Compare June 26, 2026 06:38
automerge-bot and others added 6 commits June 26, 2026 14:51
- 100 test files (50 HF model shapes x f16/f32) under kernelgenbench/accuracy/
- Registration in kernel_list.py and dataset/__init__.py
- generate_prompts.py: MmShapeBench dataset support with shape context
- generate_kernel_and_verify.py: MmShapePromptBuilder for op name mapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…enchmarks

Add resolve_op_func_name() to handle specialized operators (e.g. MmShapeBench)
where the benchmark op name (mm_128x768_768x768_f32) differs from the actual
function name in generated code (mm). The verifier now resolves names before
checking function definitions, removing the need for case-by-case prompt hacks.
@factnn factnn force-pushed the feature/mm-shape-bench-clean branch from 65facc8 to d4259f7 Compare June 26, 2026 06:52
@factnn factnn merged commit eafaa22 into flagos-ai:main Jun 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant