perf(cd2pd): thread the structured cell->point interpolation loop (byte-exact)#149
Queued
akaszynski wants to merge 1 commit into
Queued
perf(cd2pd): thread the structured cell->point interpolation loop (byte-exact)#149akaszynski wants to merge 1 commit into
akaszynski wants to merge 1 commit into
Conversation
Any commits made after this event will not be merged.
Any commits made after this event will not be merged.
Any commits made after this event will not be merged.
Any commits made after this event will not be merged.
Any commits made after this event will not be merged.
Any commits made after this event will not be merged.
Any commits made after this event will not be merged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Threads the per-output-point interpolation loop in
vtkCellDataToPointData::InterpolatePointData(input, output). A plainvtkImageData(and other structured datasets with no blanking) routes here fromRequestData, and theptIdloop was still serial even though the rest offvtk threads via the default STDThread SMP backend.
The loop is now a
vtkSMPTools::Forwrapped infvtk::RunSafeFilterParallel(the established bit-exact-safe opt-in, with the usual
GetSingleThread()-guardedUpdateProgress/CheckAbortand re-entrancy guard).cellIdsand theweights[]buffer are thread-local (
vtkSMPThreadLocalObject<vtkIdList>+ a per-thread stackbuffer).
The crux of thread-safety: every output point-data array is pre-sized to
numberOfPointstuples up front.InterpolateAllocate()only reserves capacity(
MaxId == -1); after the presize, eachInterpolateTuple(ptId,…)/InsertTuple(ptId,…)/NullData(ptId)is a pure store into an already-existingtuple — no realloc, no
MaxIdbump on any thread.NullData()inserts into everyarray in the output (not just the interpolated ones), so the pass-through arrays
copied from the input point data are resized too; they already hold exactly
numberOfPointstuples, so that is a no-op.Parity bucket: byte-exact, default-on
This is bucket 1 — byte-for-byte identical to stock VTK 9.6.2 (
maxULP = 0,same values AND same order), so it ships on by default.
Byte-exactness argument:
ptId. Threads get disjointptIdsub-ranges, so they write to disjoint, pre-sized output tuples — zero write
conflict, and emission order is preserved exactly.
regardless of how the range is partitioned across threads, so there is no
floating-point reassociation across iterations.
InterpolatePoint→InterpolateTupleiterates the samecellIdslist (produced identically by theexisting pure
StructuredGetPointCells) in the same order.processedCellData), a distinct object fromthe output; the per-thread scratch (
cellIds,weights) is the only mutablestate and it is thread-local.
The structured inputs that reach this path take the pure
StructuredGetPointCellstraversal (no shared state). For the rare non-structured fallback, any lazy
incident-cell structure is primed once on the main thread before the parallel
region so the first
GetPointCells()cannot race.Expected win
2–6× on large
vtkImageDatacell-data → point-data conversions (capped at thefvtk default of 4 threads), scaling with point count.
Validation gate
tests/bitexact/ops.py::op_cell2pointdrivesvtkCellDataToPointDataon avtkImageDatawith cell-data scalars (the exact modified image path) and is inthe
modifiedgate group (float32/float64, sizes 20/32) — covered atmaxULP = 0against stock VTK 9.6.2.
tests/bitexact/test_smp_determinism.py: addedcell2pointtoTHREADED_OPS,asserting byte-identical output at 1 / 4 / 8 threads (which holds by
construction — disjoint index writes).
No local build was run (disk/time constrained); relying on CI, which installs the
built wheel and runs
tests/bitexactatmaxULP = 0.