Skip to content

fep(sig-framework): add PyTorch-Plugin-FL v0.1.0 CUDA backend dispatch proposal#25

Open
Hchnr wants to merge 6 commits into
flagos-ai:mainfrom
Hchnr:pytorch_plugin_fl_v0.1.0
Open

fep(sig-framework): add PyTorch-Plugin-FL v0.1.0 CUDA backend dispatch proposal#25
Hchnr wants to merge 6 commits into
flagos-ai:mainfrom
Hchnr:pytorch_plugin_fl_v0.1.0

Conversation

@Hchnr

@Hchnr Hchnr commented May 28, 2026

Copy link
Copy Markdown

No description provided.

@buzhengjing

Copy link
Copy Markdown

Thanks for the proposal.

From a reviewer perspective, the current FEP provides a good architectural overview, but it does not yet contain sufficient implementation and validation details for reproducible verification.

In particular, the proposal does not currently specify:

  • Validation environment (image, dependencies, versions)
  • Installation and configuration procedures
  • Supported operators in the initial scope
  • Detailed test cases
  • Expected execution results
  • Backend selection verification methodology
  • Native CUDA vs. FlagGems comparison criteria
  • CI/regression validation strategy

Without these details, it is difficult for reviewers to reproduce the proposed workflow or assess the completeness of the implementation plan.

Could you consider adding a dedicated "Implementation and Validation Plan" section covering environment setup, test procedures, sample commands, expected outputs, and acceptance criteria?

…ification plan

Major updates to the PyTorch-Plugin-FL v0.1.0 FEP:

- Rename from "CUDA Backend" to "Multi-Backend Operator Dispatch"

- Add Ascend (Huawei) native kernel support alongside CUDA

- Introduce Dispatcher<FnPtr> template-based routing mechanism

- Support three dispatch paths: native CUDA, native Ascend, FlagGems Triton (C++ and Python)

- Define 32 first-phase operators with cross-platform implementations

- Add detailed architecture diagrams and registration flow

- Provide complete testing strategy with per-operator and end-to-end tests (Qwen3-0.6B)

- Document full verification environments for both CUDA (A800) and Ascend (910B) platforms

- Include step-by-step installation, test procedures, and expected outputs

- Add CI/CD integration and regression testing guidelines
@Hchnr

Hchnr commented Jun 9, 2026

Copy link
Copy Markdown
Author

Update: expand CUDA dispatch to multi-backend architecture with full verification plan

Major updates to the PyTorch-Plugin-FL v0.1.0 FEP:

  • Rename from "CUDA Backend" to "Multi-Backend Operator Dispatch"
  • Add Ascend (Huawei) native kernel support alongside CUDA
  • Introduce Dispatcher template-based routing mechanism
  • Support three dispatch paths: native CUDA, native Ascend, FlagGems Triton (C++ and Python)
  • Define 32 first-phase operators with cross-platform implementations
  • Add detailed architecture diagrams and registration flow
  • Provide complete testing strategy with per-operator and end-to-end tests (Qwen3-0.6B)
  • Document full verification environments for both CUDA (A800) and Ascend (910B) platforms
  • Include step-by-step installation, test procedures, and expected outputs
  • Add CI/CD integration and regression testing guidelines

Scope: Expands from CUDA-only prototype to production-ready multi-backend framework with comprehensive validation plan.

| GPU | NVIDIA A800-SXM4-80GB |
| Driver | 535.154.05 |
| CUDA Toolkit | 12.8 |
| Conda Env | `pytorch` (Python 3.12.13) |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you share the full Docker image pytorch2.11.0_cuda12.8_triton3.6.0_flaggems5.0.2?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker pull harbor.baai.ac.cn/flagscale/cuda12.8.1-cudnn9.15.1-python3.12-torch2.7.1-train:2512031616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants