From ecd625c94bf788e65999b9e0b8a2044a3eaa2bd5 Mon Sep 17 00:00:00 2001 From: Mateusz Jakub Fila Date: Mon, 22 Jun 2026 14:45:38 +0200 Subject: [PATCH] P4251: clarify CERN project description --- source/2026-07-july/d4251-ioawaitable-cuda.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/source/2026-07-july/d4251-ioawaitable-cuda.md b/source/2026-07-july/d4251-ioawaitable-cuda.md index 18aac76..d6eb5d5 100644 --- a/source/2026-07-july/d4251-ioawaitable-cuda.md +++ b/source/2026-07-july/d4251-ioawaitable-cuda.md @@ -606,7 +606,7 @@ Several independent projects have arrived at the same design: coroutine-based as **cuda-oxide (NVIDIA Labs, Rust).**[35] NVIDIA's own research lab implemented the same mechanism in Rust. Their `DeviceFuture` submits GPU work, enqueues a `cuLaunchHostFunc` callback that sets an `AtomicBool` and wakes a Tokio `Waker`, and the async runtime resumes the task on the next poll. Zero busy-wait. The three-state machine (Idle, Executing, Complete) is structurally identical to a network socket future. When NVIDIA's own research lab arrives at the same `cudaLaunchHostFunc`-to-async-runtime pattern independently, in a different language, the convergence is a data point about where the pattern fits naturally. -**CERN wp1.7-coroutine-tests.**[34] The ATLAS and LHCb experiments at CERN are evaluating C++20 coroutine patterns for task scheduling, including a Gaudi-framework-inspired coroutine hierarchy and CUDA examples. The project's [`StreamIoAwaitable`](https://github.com/cern-nextgen/wp1.7-coroutine-tests/blob/5049a37d7e74b6e2241b39dca5c81ff3aaece0e3/examples/capy_stream_await.hpp) is built directly on Capy's IoAwaitable protocol: `await_suspend(std::coroutine_handle<>, boost::capy::io_env const*)` enqueues a `cudaLaunchHostFunc` callback that, on CUDA-stream completion, posts the coroutine handle back to `env->executor` - the same `cudaLaunchHostFunc`-to-coroutine resumption described here, implemented independently against Capy's `io_env`. +**CERN wp1.7-coroutine-tests.**[34] The CERN Next Generation Triggers project is evaluating C++20 coroutine patterns for task scheduling in CPU-GPU computing systems for experimental high-energy physics. The repository contains demonstrations of selected notification mechanisms and libraries. The example[`StreamIoAwaitable`](https://github.com/cern-nextgen/wp1.7-coroutine-tests/blob/5049a37d7e74b6e2241b39dca5c81ff3aaece0e3/examples/capy_stream_await.hpp) is built directly on Capy's IoAwaitable protocol: `await_suspend(std::coroutine_handle<>, boost::capy::io_env const*)` enqueues a `cudaLaunchHostFunc` callback that, on CUDA-stream completion, posts the coroutine handle back to `env->executor` - the same `cudaLaunchHostFunc`-to-coroutine resumption described here, implemented independently against Capy's `io_env`. **Taro (University of Wisconsin-Madison).**[36] A C++20 coroutine task-graph system for CPU-GPU workloads. GPU tasks suspend the CPU thread via coroutines when waiting for GPU completion, allowing other tasks to run. Uses `cudaLaunchHostFunc` for the callback. Published at Euro-Par 2024 and presented at CppCon 2023. Reported 40-80% speedup over blocking approaches. @@ -817,7 +817,7 @@ Eric Niebler, Michał Dominiak, Lewis Baker, Lucian Radu Teodorescu, Lee H Richard Smith and Gor Nishanov for P0981R0 (HALO analysis). Chuanqi Xu for the `[[clang::coro_await_elidable]]` attribute and P2477R3 (coroutine allocation elision). Dietmar Kühl and Maikel Nadolski for P3552R3 (`std::execution::task`). Lewis Baker for cppcoro, the operator `co_await` and symmetric transfer blog posts, and P3425R1 (operation-state sizes). Michael Wong for P4029R0 (SG14 priority list). -Michael Garland and the NVIDIA stdexec team for the nvexec GPU schedulers and the Maxwell FDTD benchmark. The CERN wp1.7 team for their C++20 coroutine task-scheduling experiments and the Capy IoAwaitable integration. Dian-Lun Lin (University of Wisconsin-Madison) for Taro and its CppCon 2023 presentation. The NVIDIA Labs team for cuda-oxide. Jiqun Tu (NVIDIA) and Ellery Russell (Schrödinger) for the Desmond coroutine integration presented at GTC 2024. The TTG/PaRSEC team for demonstrating coroutine-based heterogeneous GPU dispatch at DOE Exascale scale. +Michael Garland and the NVIDIA stdexec team for the nvexec GPU schedulers and the Maxwell FDTD benchmark. The CERN Next Generation Triggers project for their C++20 coroutine task-scheduling experiments and the Capy IoAwaitable integration. Dian-Lun Lin (University of Wisconsin-Madison) for Taro and its CppCon 2023 presentation. The NVIDIA Labs team for cuda-oxide. Jiqun Tu (NVIDIA) and Ellery Russell (Schrödinger) for the Desmond coroutine integration presented at GTC 2024. The TTG/PaRSEC team for demonstrating coroutine-based heterogeneous GPU dispatch at DOE Exascale scale. This paper was generated with AI assistance (Claude, via Cursor).