Skip to content

feat(vehicle): optional HIP SCM contact-force backend (MI300X / gfx942)#755

Open
amd-pratmish wants to merge 2 commits into
projectchrono:mainfrom
amd-pratmish:feat/scm-vehicle-hip-gfx942
Open

feat(vehicle): optional HIP SCM contact-force backend (MI300X / gfx942)#755
amd-pratmish wants to merge 2 commits into
projectchrono:mainfrom
amd-pratmish:feat/scm-vehicle-hip-gfx942

Conversation

@amd-pratmish

@amd-pratmish amd-pratmish commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Single PR combining the SCM terrain HIP port (foundation + E2E perf + device body reduction), per maintainer request.

Adds an opt-in HIP path for the SCM Bekker / Mohr-Coulomb / Janosi contact-force loop in SCMLoader::ComputeInternalForces(). Ray casting and contact-patch BFS stay on CPU.

Foundation

  • CMake: CH_ENABLE_VEHICLE_SCM_GPU=ON in src/chrono_vehicle/CMakeLists.txt (OpenCRG pattern) + CHRONO_SCM_GPU_LIB_DIR
  • In-tree API headers SCMGpu.h / SCMGpuTypes.h; external HIP impl in scm_gpu_core (FindScmGpu.cmake)
  • Runtime: CHRONO_SCM_GPU=1 (uniform soil, rigid ChBody contactables)
  • Auto-fallback to CPU when hits.size() < CHRONO_SCM_GPU_MIN_HITS (default 8192)
  • Host compiler stays g++; HIP device code via CMake HIP language. Set -DCMAKE_HIP_ARCHITECTURES=gfx942 (MI300X) or gfx90a (MI210).

E2E perf

  • Pass ray-cast hits map directly to ComputeContactForcesGpu (no intermediate vector copy)
  • Call scm_gpu::PrimeBuffers() from all SCMTerrain::Initialize overloads
  • Dense per-body force accumulation in SCMTerrainGpu.cpp

Device body reduction + async bridge

  • Grid-only host scatter; body forces from device reduction (scm_reduce_body_forces_kernel)
  • Async HIP streams for pack / compute / scatter (CHRONO_SCM_GPU_ASYNC)

OpenMP → HIP split

[CPU] Update active domains
[CPU] OpenMP ray cast → hits + patch_oob
[CPU] BFS contact patches
[CPU] Pack batch (OpenMP parallel for)
[GPU] scm_compute_forces_kernel  (Bekker + Mohr-Coulomb + Janosi per hit)
[GPU] scm_reduce_body_forces_kernel (v3)
[CPU] Scatter grid / apply ChLoadBodyForce

See docs/OPENMP_TO_HIP.md — upstream OpenMP→HIP porting guide (SCM reference implementation).

Documentation

Motivation

Large SCM batches (multi-vehicle / fine soil grids) spend significant time in the per-hit force loop. Validated on MI300X (gfx942, ROCm 7.x): ~2.5× kernel speedup at 65k hits/step; E2E contact timer improves with device body reduce + async streams.

Scope / non-goals

  • No GPU ray cast, no FEA/ChLoadableUV, no spatial RegisterSoilParametersCallback
  • No bulldozing / erosion on GPU
  • scm_gpu built separately (same pattern as optional FSI HIP toolchain)

Files changed

  • src/chrono_vehicle/CMakeLists.txt
  • src/chrono_vehicle/ChConfigVehicle.h.in
  • cmake/FindScmGpu.cmake
  • src/chrono_vehicle/terrain/SCMGpu.h
  • src/chrono_vehicle/terrain/SCMGpuTypes.h
  • src/chrono_vehicle/terrain/SCMTerrain.h
  • src/chrono_vehicle/terrain/SCMTerrain.cpp
  • src/chrono_vehicle/terrain/SCMTerrainGpu.h
  • src/chrono_vehicle/terrain/SCMTerrainGpu.cpp
  • doxygen/documentation/manuals/vehicle/vehicle_terrain.md
  • doxygen/documentation/manuals/vehicle/vehicle_terrain_scm_gpu.md
  • docs/OPENMP_TO_HIP.md
  • docs/SCM_GPU_EXTERNAL.md

Test plan

  • Default CMake (CH_ENABLE_VEHICLE_SCM_GPU=OFF) builds with no HIP dependency
  • scm_parity_test --n-hits 65536 PASS (rtol 1e-5; per-hit + body-force parity)
  • scm_scaling_benchmark --n-hits 65536 --steps 200 kernel speedup ≥ 2×
  • CHRONO_SCM_GPU=1 wheel --load — contact timer improved vs CPU-only path
  • CHRONO_SCM_GPU_PROFILE=1 — steady-state pack/gpu/scatter logged; scatter_ms ≤ CPU-only E2E baseline

Supersedes

Previously opened as stacked PRs #756 (E2E) and #757 (body reduce) — closed in favor of this single PR per review feedback.

Review updates (@rserban)

  • CH_ENABLE_VEHICLE_SCM_GPU lives in src/chrono_vehicle/CMakeLists.txt (OpenCRG pattern) with HIP + scm_gpu_core prerequisite checks
  • CHRONO_HAS_SCM_GPU in ChConfigVehicle.h
  • In-tree SCMGpu.h / SCMGpuTypes.h; link external scm_gpu_core via FindScmGpu.cmake
  • Fixed duplicate HitRecord definition; Doxygen vehicle_terrain_scm_gpu.md + manual cross-link (paragraph after SCM intro, not mid-sentence)
  • SCMGpu.h API wrapped in extern "C" (linkage match with scm_gpu_core)

AMD validation (Radha MI300X / gfx942)

Pre-push gate on rad-mi300x-2 (ROCm 7.2, job 42847):

  • integrate_scm_gpu_upstream.py — vehicle CMake, CHRONO_HAS_SCM_GPU, in-tree headers
  • scm_parity_test --n-hits 65536 PASS (max abs err ~3.5e-10)
  • ninja Chrono_vehicle with CH_ENABLE_VEHICLE_SCM_GPU=ON
  • chrono_scm_gpu_wheel_test CPU + GPU env smoke (100 steps)

Not re-run on this reship: scm_scaling_benchmark, btest_SCM_VEHscaling, --load fine-grid E2E (≥8k hits/step).

@amd-pratmish

Copy link
Copy Markdown
Contributor Author

@DanNegrut

@rserban rserban self-assigned this Jun 30, 2026
@amd-pratmish amd-pratmish force-pushed the feat/scm-vehicle-hip-gfx942 branch from dafbbbe to c4bca3a Compare June 30, 2026 21:14
@amd-pratmish

Copy link
Copy Markdown
Contributor Author

@rserban : Based on your feedback, merged the PR's #756, #757 into this PR #755

https://github.com/projectchrono/chrono/pull/757#issuecomment-4841147303

@rserban

rserban commented Jul 1, 2026

Copy link
Copy Markdown
Member

@amd-pratmish - thanks for the PR. This looks like a very interesting contribution!

Having said that, a few issues and comments:

  1. The CMake option CH_ENABLE_VEHICLE_SCM_GPU should be moved in src/chrono_vehicle/CMakeLists.txt (and treated similarly to CH_ENABLE_OPENCRG).

  2. It is not enough to let the user enable "SCM_GPU" via the above CMake option. There should be additional checks to ensure that all prerequisites are available. If some are missing, a message should be displayed and this option disabled. Chrono should configure without additional user involvement in all cases (e.g., even if a GPU is not present at all).

  3. Please also include a macro CHRONO_HAS_SCM_GPU in the automatically generated ChConfigVehicle.h (modify accordingly the template src/chrono_vehicle/ChConfigVehicle.h.in and create the necessary replacement variables in the Chrono::Vehicle CMakeLists.txt) -- see how this is done for CHRONO_HAS_OPENCRG.

  4. SCMTerrain.cpp includes (line 42) "scm_gpu.h". This file is missing. Also, for consistency, please consider renaming it to something like SCMGpu.h or SCMGpuUtils.h (as appropriate -- I don't really know what that file provides).

  5. SCMTerrain.h already includes <unordered_map>. No need to do that again conditional on CHRONO_VEHICLE_SCM_GPU.

  6. When CH_ENABLE_VEHICLE_SCM_GPU=OFF, struct HitRecord is redefined! Lines 1167-1171 in SCMTerrain.cpp should be deleted.

  7. At line 916 of src/chrono_vehicle/CMakeLists.txt you add -fopenmp for clang and hipcc. What about other compilers (e.g., GCC)?

  8. This new feature must be documented somewhere in the Chrono::Vehicle "manual". Please provide a file vehicle_terrain_scm_gpu.md in doxygen/documentation/manuals/vehicle/ that describes prerequisites, configuration, etc. for this feature and reference it (with a bit of context) from the "Deformable SCM (Soil Contact Model)" section of vehicle_terrain.md. The new file vehicle_terrain_scm_gpu.md should probably reference the two documents you added under docs.

  9. Please apply clang-format to files you add or modify before committing them (a config file is available in the Chrono top-level directory).


  • For now, all I can say is that the PR code does build correctly with CH_ENABLE_VEHICLE_SCM_GPU=OFF (provided the fix for [6] above is made).
  • Because of the missing scm_gpu.h ([5] above), I was not able to check the PR with CH_ENABLE_VEHICLE_SCM_GPU=ON.

Please:

  • include the above fixes.
  • check that everything configures, builds, and runs fine in all cases and combinations (with SCM_GPU enabled and disabled, without a GPU, with an NVIDIA or an AMD GPU, with or without hipcc, etc.).

Thanks!

@amd-pratmish amd-pratmish force-pushed the feat/scm-vehicle-hip-gfx942 branch from c4bca3a to 4597f63 Compare July 1, 2026 18:32
@amd-pratmish

Copy link
Copy Markdown
Contributor Author

@amd-pratmish - thanks for the PR. This looks like a very interesting contribution!

Having said that, a few issues and comments:

  1. The CMake option CH_ENABLE_VEHICLE_SCM_GPU should be moved in src/chrono_vehicle/CMakeLists.txt (and treated similarly to CH_ENABLE_OPENCRG).
  2. It is not enough to let the user enable "SCM_GPU" via the above CMake option. There should be additional checks to ensure that all prerequisites are available. If some are missing, a message should be displayed and this option disabled. Chrono should configure without additional user involvement in all cases (e.g., even if a GPU is not present at all).
  3. Please also include a macro CHRONO_HAS_SCM_GPU in the automatically generated ChConfigVehicle.h (modify accordingly the template src/chrono_vehicle/ChConfigVehicle.h.in and create the necessary replacement variables in the Chrono::Vehicle CMakeLists.txt) -- see how this is done for CHRONO_HAS_OPENCRG.
  4. SCMTerrain.cpp includes (line 42) "scm_gpu.h". This file is missing. Also, for consistency, please consider renaming it to something like SCMGpu.h or SCMGpuUtils.h (as appropriate -- I don't really know what that file provides).
  5. SCMTerrain.h already includes <unordered_map>. No need to do that again conditional on CHRONO_VEHICLE_SCM_GPU.
  6. When CH_ENABLE_VEHICLE_SCM_GPU=OFF, struct HitRecord is redefined! Lines 1167-1171 in SCMTerrain.cpp should be deleted.
  7. At line 916 of src/chrono_vehicle/CMakeLists.txt you add -fopenmp for clang and hipcc. What about other compilers (e.g., GCC)?
  8. This new feature must be documented somewhere in the Chrono::Vehicle "manual". Please provide a file vehicle_terrain_scm_gpu.md in doxygen/documentation/manuals/vehicle/ that describes prerequisites, configuration, etc. for this feature and reference it (with a bit of context) from the "Deformable SCM (Soil Contact Model)" section of vehicle_terrain.md. The new file vehicle_terrain_scm_gpu.md should probably reference the two documents you added under docs.
  9. Please apply clang-format to files you add or modify before committing them (a config file is available in the Chrono top-level directory).
  • For now, all I can say is that the PR code does build correctly with CH_ENABLE_VEHICLE_SCM_GPU=OFF (provided the fix for [6] above is made).
  • Because of the missing scm_gpu.h ([5] above), I was not able to check the PR with CH_ENABLE_VEHICLE_SCM_GPU=ON.

Please:

  • include the above fixes.
  • check that everything configures, builds, and runs fine in all cases and combinations (with SCM_GPU enabled and disabled, without a GPU, with an NVIDIA or an AMD GPU, with or without hipcc, etc.).

Thanks!

Hi @rserban,

Thank you for your review and providing fine-granular comments.

Based on your feedback, the following changes were made and pushed:

  1. Option + prerequisite checks live in vehicle CMake; root CMakeLists.txt no longer touched
  2. Added find_package(HIP) + find_package(ScmGpu) with warnings
  3. CHRONO_SCM_GPU substitution wired in CMake
  4. Headers added; SCMTerrainGpu.cpp includes chrono_vehicle/terrain/SCMGpu.h
  5. Conditional duplicate stripped in integrate script
  6. Deleted duplicate HitRecord struct - dedupe_hitrecord() + validation
  7. Links OpenMP::OpenMP_CXX when target exists. BTW, this PR was originally meant for OpenMP to HIP translation.
  8. New page + link from vehicle_terrain.md
  9. clang format: Applied and integrate scripts on auto-formats on export.

@amd-pratmish amd-pratmish force-pushed the feat/scm-vehicle-hip-gfx942 branch from 538b991 to 90f8a9c Compare July 1, 2026 20:51
amd-chronos ship-kit and others added 2 commits July 1, 2026 21:04
### Summary

Single PR combining the SCM terrain HIP port (foundation + E2E perf + device body reduction), per maintainer request.

Adds an **opt-in HIP path** for the SCM Bekker / Mohr-Coulomb / Janosi contact-force loop in `SCMLoader::ComputeInternalForces()`. Ray casting and contact-patch BFS stay on CPU.

**Foundation**
- CMake: `CH_ENABLE_VEHICLE_SCM_GPU=ON` in `src/chrono_vehicle/CMakeLists.txt` (OpenCRG pattern) + `CHRONO_SCM_GPU_LIB_DIR`
- In-tree API headers `SCMGpu.h` / `SCMGpuTypes.h`; external HIP impl in `scm_gpu_core` (`FindScmGpu.cmake`)
- Runtime: `CHRONO_SCM_GPU=1` (uniform soil, rigid `ChBody` contactables)
- Auto-fallback to CPU when `hits.size() < CHRONO_SCM_GPU_MIN_HITS` (default **8192**)
- Host compiler stays **g++**; HIP device code via CMake HIP language. Set `-DCMAKE_HIP_ARCHITECTURES=gfx942` (MI300X) or `gfx90a` (MI210).

**E2E perf**
- Pass ray-cast `hits` map directly to `ComputeContactForcesGpu` (no intermediate vector copy)
- Call `scm_gpu::PrimeBuffers()` from all `SCMTerrain::Initialize` overloads
- Dense per-body force accumulation in `SCMTerrainGpu.cpp`

**Device body reduction + async bridge**
- Grid-only host scatter; body forces from device reduction (`scm_reduce_body_forces_kernel`)

Ship kit: amd-chronos contrib/upstream_ready/phase2c/combined
Insert HIP backend reference as its own paragraph after the SCM intro
(per review feedback); repair broken mid-sentence insertion.

Co-authored-by: Cursor <cursoragent@cursor.com>
@amd-pratmish amd-pratmish force-pushed the feat/scm-vehicle-hip-gfx942 branch from 90f8a9c to d8714d2 Compare July 1, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants