Skip to content

perf(vehicle): SCM HIP device body reduction and async host bridge#757

Closed
amd-pratmish wants to merge 4 commits into
projectchrono:mainfrom
amd-pratmish:perf/scm-vehicle-hip-body-reduce
Closed

perf(vehicle): SCM HIP device body reduction and async host bridge#757
amd-pratmish wants to merge 4 commits into
projectchrono:mainfrom
amd-pratmish:perf/scm-vehicle-hip-body-reduce

Conversation

@amd-pratmish

@amd-pratmish amd-pratmish commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up perf PR for SCM HIP integration:

  • SCMTerrainGpu.cpp: grid-only host scatter; body forces from device reduction
  • Requires rebuilt external scm_gpu with scm_reduce_body_forces_kernel and scm_gpu_compute_forces_staged(..., n_bodies)

Depends on

  • Merge first: #755, then #756
  • Incremental diff on fork: amd-pratmish:perf/scm-vehicle-hip-body-reduce based on perf/scm-vehicle-hip-e2e

Files in this PR

  • src/chrono_vehicle/terrain/SCMTerrainGpu.cpp

Test plan

  • scm_parity_test PASS (per-hit + body-force parity)
  • scm_scaling_benchmark --n-hits 65536 kernel speedup ≥ 2×
  • E2E CHRONO_SCM_GPU_PROFILE=1 — scatter_ms ≤ v2

amd-chronos ship-kit and others added 4 commits June 29, 2026 19:07
### Summary

Adds an **opt-in HIP path** for the SCM Bekker / Mohr-Coulomb / Janosi contact-force loop in `SCMLoader::ComputeInternalForces()`. Ray casting and contact-patch BFS stay on CPU.

- CMake: `CH_ENABLE_VEHICLE_SCM_GPU=ON` + `CHRONO_SCM_GPU_INCLUDE_DIR` / `CHRONO_SCM_GPU_LIB_DIR`
- Runtime: `CHRONO_SCM_GPU=1` (uniform soil, rigid `ChBody` contactables in v1)
- Auto-fallback to CPU when `hits.size() < CHRONO_SCM_GPU_MIN_HITS` (default **8192**)
- Host compiler stays **g++**; HIP device code compiles via CMake HIP language (`CMAKE_HIP_COMPILER` auto-detected). Set `-DCMAKE_HIP_ARCHITECTURES=gfx942` (MI300X) or `gfx90a` (MI210).

### OpenMP → HIP split (v1)

Porting pattern for CPU/OpenMP Chrono modules — only the dense inner loop moves to HIP:

```text
[CPU] Update active domains
[CPU] OpenMP ray cast → hits + patch_oob
[CPU] BFS contact patches
[CPU] Pack batch (OpenMP parallel for)
[GPU] scm_compute_forces_kernel  (Bekker + Mohr-Coulomb + Janosi per hit)
[CPU] Reduce per-body forces → ChLoadBodyForce

Ship kit: amd-chronos contrib/upstream_ready/phase2c/v1
Upstream-facing playbook (no private integration references).

Co-authored-by: Cursor <cursoragent@cursor.com>
…ody reduce

### Summary

Follow-up to the optional SCM HIP backend. Improves end-to-end Chrono performance:

- Pass ray-cast `hits` map directly to `ComputeContactForcesGpu` (no intermediate vector copy)
- Call `scm_gpu::PrimeBuffers()` from all `SCMTerrain::Initialize` overloads
- Dense per-body force accumulation in `SCMTerrainGpu.cpp`

### Depends on

- **Merged:** `feat(vehicle): optional HIP SCM contact-force backend` (v1)

### Files in this PR

- `src/chrono_vehicle/terrain/SCMTerrain.h`
- `src/chrono_vehicle/terrain/SCMTerrain.cpp`
- `src/chrono_vehicle/terrain/SCMTerrainGpu.cpp`

Rebuild external `scm_gpu` from matching `amd-chronos` ref (see `SCM_GPU_EXTERNAL.md`).

Ship kit: amd-chronos contrib/upstream_ready/phase2c/v2
### Summary

Follow-up perf PR for SCM HIP integration:

- `SCMTerrainGpu.cpp`: grid-only host scatter; body forces from device reduction
- Requires rebuilt external `scm_gpu` with `scm_reduce_body_forces_kernel` and `scm_gpu_compute_forces_staged(..., n_bodies)`

### Depends on

- **Merged:** v1 foundation + v2 E2E PRs

### Files in this PR

- `src/chrono_vehicle/terrain/SCMTerrainGpu.cpp`

### Test plan

- [ ] `scm_parity_test` PASS (per-hit + body-force parity)
- [ ] `scm_scaling_benchmark --n-hits 65536` kernel speedup ≥ 2×
- [ ] E2E `CHRONO_SCM_GPU_PROFILE=1` — scatter_ms ≤ v2

Ship kit: amd-chronos contrib/upstream_ready/phase2c/v3
@amd-pratmish amd-pratmish force-pushed the perf/scm-vehicle-hip-body-reduce branch from 53748be to 8ca5cf1 Compare June 29, 2026 19:16
@amd-pratmish

Copy link
Copy Markdown
Contributor Author

@DanNegrut

@rserban rserban self-assigned this Jun 30, 2026
@rserban

rserban commented Jun 30, 2026

Copy link
Copy Markdown
Member

@amd-pratmish - could you combine these 3 PRs into a single one? That way, we can test and review everything at once.
Thanks.

@amd-pratmish

amd-pratmish commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

@rserban : Superseded by combined SCM HIP PR #755 (maintainer requested single PR), based on your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants