feat(vehicle): optional HIP SCM contact-force backend (MI300X / gfx942)#755
feat(vehicle): optional HIP SCM contact-force backend (MI300X / gfx942)#755amd-pratmish wants to merge 2 commits into
Conversation
dafbbbe to
c4bca3a
Compare
|
@rserban : Based on your feedback, merged the PR's #756, #757 into this PR #755 https://github.com/projectchrono/chrono/pull/757#issuecomment-4841147303 |
|
@amd-pratmish - thanks for the PR. This looks like a very interesting contribution! Having said that, a few issues and comments:
Please:
Thanks! |
c4bca3a to
4597f63
Compare
Hi @rserban, Thank you for your review and providing fine-granular comments. Based on your feedback, the following changes were made and pushed:
|
538b991 to
90f8a9c
Compare
### Summary Single PR combining the SCM terrain HIP port (foundation + E2E perf + device body reduction), per maintainer request. Adds an **opt-in HIP path** for the SCM Bekker / Mohr-Coulomb / Janosi contact-force loop in `SCMLoader::ComputeInternalForces()`. Ray casting and contact-patch BFS stay on CPU. **Foundation** - CMake: `CH_ENABLE_VEHICLE_SCM_GPU=ON` in `src/chrono_vehicle/CMakeLists.txt` (OpenCRG pattern) + `CHRONO_SCM_GPU_LIB_DIR` - In-tree API headers `SCMGpu.h` / `SCMGpuTypes.h`; external HIP impl in `scm_gpu_core` (`FindScmGpu.cmake`) - Runtime: `CHRONO_SCM_GPU=1` (uniform soil, rigid `ChBody` contactables) - Auto-fallback to CPU when `hits.size() < CHRONO_SCM_GPU_MIN_HITS` (default **8192**) - Host compiler stays **g++**; HIP device code via CMake HIP language. Set `-DCMAKE_HIP_ARCHITECTURES=gfx942` (MI300X) or `gfx90a` (MI210). **E2E perf** - Pass ray-cast `hits` map directly to `ComputeContactForcesGpu` (no intermediate vector copy) - Call `scm_gpu::PrimeBuffers()` from all `SCMTerrain::Initialize` overloads - Dense per-body force accumulation in `SCMTerrainGpu.cpp` **Device body reduction + async bridge** - Grid-only host scatter; body forces from device reduction (`scm_reduce_body_forces_kernel`) Ship kit: amd-chronos contrib/upstream_ready/phase2c/combined
Insert HIP backend reference as its own paragraph after the SCM intro (per review feedback); repair broken mid-sentence insertion. Co-authored-by: Cursor <cursoragent@cursor.com>
90f8a9c to
d8714d2
Compare
Summary
Single PR combining the SCM terrain HIP port (foundation + E2E perf + device body reduction), per maintainer request.
Adds an opt-in HIP path for the SCM Bekker / Mohr-Coulomb / Janosi contact-force loop in
SCMLoader::ComputeInternalForces(). Ray casting and contact-patch BFS stay on CPU.Foundation
CH_ENABLE_VEHICLE_SCM_GPU=ONinsrc/chrono_vehicle/CMakeLists.txt(OpenCRG pattern) +CHRONO_SCM_GPU_LIB_DIRSCMGpu.h/SCMGpuTypes.h; external HIP impl inscm_gpu_core(FindScmGpu.cmake)CHRONO_SCM_GPU=1(uniform soil, rigidChBodycontactables)hits.size() < CHRONO_SCM_GPU_MIN_HITS(default 8192)-DCMAKE_HIP_ARCHITECTURES=gfx942(MI300X) orgfx90a(MI210).E2E perf
hitsmap directly toComputeContactForcesGpu(no intermediate vector copy)scm_gpu::PrimeBuffers()from allSCMTerrain::InitializeoverloadsSCMTerrainGpu.cppDevice body reduction + async bridge
scm_reduce_body_forces_kernel)CHRONO_SCM_GPU_ASYNC)OpenMP → HIP split
See
docs/OPENMP_TO_HIP.md— upstream OpenMP→HIP porting guide (SCM reference implementation).Documentation
docs/OPENMP_TO_HIP.md— porting playbookdocs/SCM_GPU_EXTERNAL.md— build/configure externalscm_gpulibraryMotivation
Large SCM batches (multi-vehicle / fine soil grids) spend significant time in the per-hit force loop. Validated on MI300X (gfx942, ROCm 7.x): ~2.5× kernel speedup at 65k hits/step; E2E contact timer improves with device body reduce + async streams.
Scope / non-goals
ChLoadableUV, no spatialRegisterSoilParametersCallbackscm_gpubuilt separately (same pattern as optional FSI HIP toolchain)Files changed
src/chrono_vehicle/CMakeLists.txtsrc/chrono_vehicle/ChConfigVehicle.h.incmake/FindScmGpu.cmakesrc/chrono_vehicle/terrain/SCMGpu.hsrc/chrono_vehicle/terrain/SCMGpuTypes.hsrc/chrono_vehicle/terrain/SCMTerrain.hsrc/chrono_vehicle/terrain/SCMTerrain.cppsrc/chrono_vehicle/terrain/SCMTerrainGpu.hsrc/chrono_vehicle/terrain/SCMTerrainGpu.cppdoxygen/documentation/manuals/vehicle/vehicle_terrain.mddoxygen/documentation/manuals/vehicle/vehicle_terrain_scm_gpu.mddocs/OPENMP_TO_HIP.mddocs/SCM_GPU_EXTERNAL.mdTest plan
CH_ENABLE_VEHICLE_SCM_GPU=OFF) builds with no HIP dependencyscm_parity_test --n-hits 65536PASS (rtol 1e-5; per-hit + body-force parity)scm_scaling_benchmark --n-hits 65536 --steps 200kernel speedup ≥ 2×CHRONO_SCM_GPU=1wheel--load— contact timer improved vs CPU-only pathCHRONO_SCM_GPU_PROFILE=1— steady-state pack/gpu/scatter logged; scatter_ms ≤ CPU-only E2E baselineSupersedes
Previously opened as stacked PRs #756 (E2E) and #757 (body reduce) — closed in favor of this single PR per review feedback.
Review updates (@rserban)
CH_ENABLE_VEHICLE_SCM_GPUlives insrc/chrono_vehicle/CMakeLists.txt(OpenCRG pattern) with HIP +scm_gpu_coreprerequisite checksCHRONO_HAS_SCM_GPUinChConfigVehicle.hSCMGpu.h/SCMGpuTypes.h; link externalscm_gpu_coreviaFindScmGpu.cmakeHitRecorddefinition; Doxygenvehicle_terrain_scm_gpu.md+ manual cross-link (paragraph after SCM intro, not mid-sentence)SCMGpu.hAPI wrapped inextern "C"(linkage match withscm_gpu_core)AMD validation (Radha MI300X / gfx942)
Pre-push gate on
rad-mi300x-2(ROCm 7.2, job 42847):integrate_scm_gpu_upstream.py— vehicle CMake,CHRONO_HAS_SCM_GPU, in-tree headersscm_parity_test --n-hits 65536PASS (max abs err ~3.5e-10)ninja Chrono_vehiclewithCH_ENABLE_VEHICLE_SCM_GPU=ONchrono_scm_gpu_wheel_testCPU + GPU env smoke (100 steps)Not re-run on this reship:
scm_scaling_benchmark,btest_SCM_VEHscaling,--loadfine-grid E2E (≥8k hits/step).