perf(vehicle): SCM HIP device body reduction and async host bridge#757
Closed
amd-pratmish wants to merge 4 commits into
Closed
perf(vehicle): SCM HIP device body reduction and async host bridge#757amd-pratmish wants to merge 4 commits into
amd-pratmish wants to merge 4 commits into
Conversation
### Summary Adds an **opt-in HIP path** for the SCM Bekker / Mohr-Coulomb / Janosi contact-force loop in `SCMLoader::ComputeInternalForces()`. Ray casting and contact-patch BFS stay on CPU. - CMake: `CH_ENABLE_VEHICLE_SCM_GPU=ON` + `CHRONO_SCM_GPU_INCLUDE_DIR` / `CHRONO_SCM_GPU_LIB_DIR` - Runtime: `CHRONO_SCM_GPU=1` (uniform soil, rigid `ChBody` contactables in v1) - Auto-fallback to CPU when `hits.size() < CHRONO_SCM_GPU_MIN_HITS` (default **8192**) - Host compiler stays **g++**; HIP device code compiles via CMake HIP language (`CMAKE_HIP_COMPILER` auto-detected). Set `-DCMAKE_HIP_ARCHITECTURES=gfx942` (MI300X) or `gfx90a` (MI210). ### OpenMP → HIP split (v1) Porting pattern for CPU/OpenMP Chrono modules — only the dense inner loop moves to HIP: ```text [CPU] Update active domains [CPU] OpenMP ray cast → hits + patch_oob [CPU] BFS contact patches [CPU] Pack batch (OpenMP parallel for) [GPU] scm_compute_forces_kernel (Bekker + Mohr-Coulomb + Janosi per hit) [CPU] Reduce per-body forces → ChLoadBodyForce Ship kit: amd-chronos contrib/upstream_ready/phase2c/v1
Upstream-facing playbook (no private integration references). Co-authored-by: Cursor <cursoragent@cursor.com>
…ody reduce ### Summary Follow-up to the optional SCM HIP backend. Improves end-to-end Chrono performance: - Pass ray-cast `hits` map directly to `ComputeContactForcesGpu` (no intermediate vector copy) - Call `scm_gpu::PrimeBuffers()` from all `SCMTerrain::Initialize` overloads - Dense per-body force accumulation in `SCMTerrainGpu.cpp` ### Depends on - **Merged:** `feat(vehicle): optional HIP SCM contact-force backend` (v1) ### Files in this PR - `src/chrono_vehicle/terrain/SCMTerrain.h` - `src/chrono_vehicle/terrain/SCMTerrain.cpp` - `src/chrono_vehicle/terrain/SCMTerrainGpu.cpp` Rebuild external `scm_gpu` from matching `amd-chronos` ref (see `SCM_GPU_EXTERNAL.md`). Ship kit: amd-chronos contrib/upstream_ready/phase2c/v2
### Summary Follow-up perf PR for SCM HIP integration: - `SCMTerrainGpu.cpp`: grid-only host scatter; body forces from device reduction - Requires rebuilt external `scm_gpu` with `scm_reduce_body_forces_kernel` and `scm_gpu_compute_forces_staged(..., n_bodies)` ### Depends on - **Merged:** v1 foundation + v2 E2E PRs ### Files in this PR - `src/chrono_vehicle/terrain/SCMTerrainGpu.cpp` ### Test plan - [ ] `scm_parity_test` PASS (per-hit + body-force parity) - [ ] `scm_scaling_benchmark --n-hits 65536` kernel speedup ≥ 2× - [ ] E2E `CHRONO_SCM_GPU_PROFILE=1` — scatter_ms ≤ v2 Ship kit: amd-chronos contrib/upstream_ready/phase2c/v3
53748be to
8ca5cf1
Compare
Contributor
Author
Member
|
@amd-pratmish - could you combine these 3 PRs into a single one? That way, we can test and review everything at once. |
9 tasks
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up perf PR for SCM HIP integration:
SCMTerrainGpu.cpp: grid-only host scatter; body forces from device reductionscm_gpuwithscm_reduce_body_forces_kernelandscm_gpu_compute_forces_staged(..., n_bodies)Depends on
amd-pratmish:perf/scm-vehicle-hip-body-reducebased onperf/scm-vehicle-hip-e2eFiles in this PR
src/chrono_vehicle/terrain/SCMTerrainGpu.cppTest plan
scm_parity_testPASS (per-hit + body-force parity)scm_scaling_benchmark --n-hits 65536kernel speedup ≥ 2×CHRONO_SCM_GPU_PROFILE=1— scatter_ms ≤ v2