Examples refactor#72
Merged
jiri-filipovic merged 160 commits intoJun 1, 2026
Merged
Conversation
Add fluid simulation example
…measurements - UpdateArgument, DownloadArgument, CopyArgument now treat dataSize==0 as "use full buffer size", matching the OpenCL backend convention - Store standard deviation from ExecuteWithStableTiming in the ComputationResult, which was previously silently dropped Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nize - ClearData and ClearKernelData now call .wait() on futures before erasing them, preventing undefined behavior from destroying running std::async tasks - SynchronizeQueue/SynchronizeQueues/SynchronizeDevice now wait for all pending compute and transfer actions to complete, matching the OpenCL backend's synchronization contract Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local and Symbol memory types have no hardware equivalent on CPU. Instead of failing with a buffer lookup error, skip them during argument binding and log a warning. Similar to CUDA which also skips these during argument binding (handling them via separate backend-specific mechanisms). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…parately and with rest of TPs
Enable macro expansion for KTT_API/KTT_VIRTUAL_API visibility macros so Doxygen correctly parses class and enum declarations. Fix @fn tag mismatches in KernelResult.h (missing timestamp param, wrong return type) and @param name mismatch in Tuner.h (powerParams -> preciseParams). Add ParameterValueType.h to Doxygen INPUT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Before: profiling overhead related to infrastructure was accumulated in KernelResult.m_Overhead Now: it is accumulated in KernelResult.m_ProfilingOverhead Also, minor fix in PythonTuner that appeared during rebase
Before: The kernel duration of the first run (i.e., the run without profiling) was put into the result of the final pass, as this one gets saved as the kernel run's result. However, this impacted the accoutning of profiling runs overhead (as kernel duration of extra passes is added to profiling runs overhead). Now: The kernel duration of the first pass, along with kernel overhead and compilation overhead of the first pass are copied into the final pass results AFTER all the overheads are accounted for. This ensures that the kernel duration of the first pass is preserved in the final output, while the profiling runs overhead is correctly calculated.
Before: KernelResult:m_ExtraDuration included profiling infrastructure overhead and was accumulated over all passes Now: - m_ExtraDuration stores only duration of user-specified launcher tasks such as asynchronous data movements and synchronization, excluding compilation, data movements and profiling infrastructure overhead. - extra duration of the first pass is reported in the output and is used for calculation in GetTotalDuration() - extra duration of the profiling passes is accumulated in m_ProfilingRunsOverhead and thus part of total overhead Why: - removes double counting, as the profiling infrastructure overhead is already accounted for in m_ProfilingOverhead - removes accumulated value of extra duration over all passes to show up in total duration which was not correct - removes accumulated value of extra duration over all passesto show up in total overhead as part of profiling overhead, which was not correct
Before: Precise measurements for power and time lacked tracking for the overhead they cost. After: New category of overhead is added and accounted for. Why: To track all overheads.
…ngine, there could be another stuff arising there...)
jiri-filipovic
requested changes
May 29, 2026
Member
jiri-filipovic
left a comment
There was a problem hiding this comment.
Rename ReferenceVersions to LegacyExamples and hide correctness scripts there. Otherwise seems good. We, of course, have to double-check if everything is fine before removing the LegacyExamples, but it seems ready to merge at development.
Author
ReferenceVersions have been renamed and the scripts have been moved. I also discovered an uncaught bug in Sort, which is fixed now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactors the Examples with a hierarchy of common base classes and a common CLI handling system, tied to https://is.muni.cz/auth/th/u3441/ .
ExampleBase provides basic functionality with no reference.
ExampleReferenceKernel is for Examples with a reference kernel.
ExampleReferenceComputation is for Examples with a reference computation.
Each Example customizes an appropriate base class through overriding methods.
Customizing the CLI for individual Examples is currently complicated due to focus on generality. It is intended to be rewritten in the future. Support for separate compiler tuning is currently not good, as it is a new feature; proper support is intended in the future.
premake5.lua has been refactored to reduce duplication in setting up Example projects.
AtfSamples have been split into separate projects. Legacy Examples have been updated and pulled up into the Examples folder.
FluidSimulation is not refactored due to being a third-party complex project. Microbenchmarks is not refactored due to being highly unusual and very likely to change in the near future.
Certain commits from upstream have been rebased into this branch in a miguided attempt at linear history. Removing them would risk further chaos. Apologies for the mess.