feat[gpu]: arrow device array stream support by 0ax1 · Pull Request #8483 · vortex-data/vortex

0ax1 · 2026-06-17T21:15:32Z

No description provided.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

This reverts commit 52952d2. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

0ax1 · 2026-06-17T21:24:57Z

@claude review in depth

github-actions · 2026-06-17T21:25:32Z

Claude finished @0ax1's task in 3m 22s —— View job

Claude Code Review — Arrow Device Array Stream support

Gather PR diff and context
Review vortex-cuda/src/arrow/mod.rs (device array stream impl)
Review FFI changes (vortex-cuda/ffi, vortex-ffi)
Review e2e test harness changes
Post review feedback

Overall this is a careful, well-documented implementation of the Arrow C Device stream interface on top of the existing single-array device export. The unsafe FFI surface is handled with discipline: panics are caught at every callback boundary, double-release is guarded, Drop releases the buffered batch, and the underlying Arc<CudaStream> is retained per-batch so outstanding arrays survive stream release. The e2e coverage (sliced, multi-buffer, dict, decimal, list/FSL, temporal) is excellent. A few things worth considering below.

Findings

1. `futures::executor::block_on` won't drive a Tokio stream (medium)

vortex-cuda/src/arrow/mod.rs:299-305

impl Iterator for FuturesArrayStreamIterator {
    fn next(&mut self) -> Option<Self::Item> {
        futures::executor::block_on(self.stream.next())
    }
}

The public DeviceArrayStreamExt for SendableArrayStream polls the stream with futures::executor::block_on. That executor does not install a Tokio reactor, so a SendableArrayStream backed by async file/object-store IO (the normal vortex-file scan case) will fail or stall when it awaits IO. The FFI partition path sidesteps this correctly by going through RUNTIME.block_on_stream(...) in vx_partition_into_array_iter (vortex-ffi/src/scan.rs:97-103) before constructing the stream.

So today the trait impl is only safe for runtime-independent (in-memory) streams, which is all the tests exercise. Since this is a public API, I'd either route it through the Vortex runtime or document the constraint loudly on the trait. As written it's an easy footgun for the first caller who hands it a file-backed stream. Fix this →

Note export_batch also calls futures::executor::block_on(array.export_device_array_with_schema(...)) — that's fine and consistent with vx_cuda_array_export_arrow_device, assuming the export future never awaits Tokio primitives (which appears to be the case).

2. Schema is committed before device validation in `export_batch` (low)

vortex-cuda/src/arrow/mod.rs:385-414

Inside the validation closure, self.schema = Some(batch_schema) is set before the device_type / device_id checks run. If the very first batch exports onto the wrong device, the array is released and an error is returned — but self.schema is now permanently set from a rejected batch, so a later get_schema would report a schema for a stream that already errored. Consider validating device type/id first and only committing the schema once the batch is fully accepted.

3. Empty-stream schema can diverge from a non-empty stream's schema (low / worth a comment)

vortex-cuda/src/arrow/mod.rs:328-344 and from_dtype (265-283)

ensure_schema/export_batch derive the reported schema from the first concrete batch via from_ffi, which preserves nested encodings (e.g. dictionary → DataType::Dictionary). The empty-stream fallback from_dtype derives only from the logical DType, where a dict-encoded column's logical type is not Dictionary. So an empty stream of dictionary data reports a plain (non-dictionary) field, while a non-empty one reports a dictionary field. There's no data in the empty case so it's likely harmless, but a one-line comment noting the intentional difference would help future readers.

4. `set_error` silently drops the message on the (now-impossible) NUL path (nit)

vortex-cuda/src/arrow/mod.rs:322-326

let message = error.to_string().replace('\0', "\\0");
self.last_error = CString::new(message).ok();

After stripping all \0, CString::new can no longer fail, so .ok() will always be Some. It's harmless, but if you want defense-in-depth, fall back to a static message (unwrap_or_else(|| CString::new("error").unwrap())) so get_last_error never returns NULL while a non-zero status is reported.

5. Thread affinity of the exported stream (question / doc)

The ArrowDeviceArrayStream carries private_data holding a CudaExecutionCtx and Box<dyn Iterator> (no Send bound). The Arrow C stream contract is single-threaded, but consumers may legitimately move the stream to one other thread before driving it. Worth confirming the CudaExecutionCtx / cudarc stream is safe to use from a thread other than the one that created it, and documenting the "drive from a single thread" expectation on export_device_array_stream_from_iter.

Things I verified and liked

Double-release / release protocol: device_stream_release nulls all callbacks and guards re-entry; EOS is correctly signaled via an ArrowArray::empty() with release == None. The test exercises both release-twice and EOS.
Leak safety: release_schema/release_device_array are called on every error path in export_batch, and Drop for DeviceArrayStreamPrivateData releases a buffered pending_array.
Dictionary code widening (arrow_device_export_dictionary_codes_dtype, U8→I16 etc.) matches the cuDF signed-index requirement and is confirmed end-to-end by the DictionaryArray<Int16Type> expectation in validate_array.
FFI ownership docs in vortex_cuda.h and the Rust headers are precise about who releases what.

Checks I did not run

This review is static only — I did not build or run tests. The CUDA paths require GPU hardware (#[cuda_test] / compute-sanitizer) that isn't available here, and cargo/clippy were not executed. I'd recommend the author run cargo clippy --all-targets --all-features on the touched crates and the CUDA e2e harness per CLAUDE.md before merge.
· branch ad/arrow-device-array-stream

codspeed-hq · 2026-06-17T21:26:11Z

Merging this PR will improve performance by 15.43%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 6 improved benchmarks
❌ 1 regressed benchmark
✅ 1574 untouched benchmarks
⏩ 3 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	177.7 µs	213.9 µs	-16.94%
⚡	Simulation	`take_10k_random`	255.8 µs	197.8 µs	+29.27%
⚡	Simulation	`take_10k_contiguous`	276.3 µs	218.5 µs	+26.46%
⚡	Simulation	`patched_take_10k_contiguous_patches`	291 µs	232.3 µs	+25.26%
⚡	Simulation	`patched_take_10k_random`	303 µs	244.2 µs	+24.07%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	215.3 ns	186.1 ns	+15.67%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	275.6 ns	246.4 ns	+11.84%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing ad/arrow-device-array-stream (3ae0d00) with develop (a4476f1)}

3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

0ax1 added 9 commits June 17, 2026 18:46

Implement Arrow device array stream

6a78fea

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Test CUDA Arrow device array streams with cuDF harness

e61249b

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Document cuDF device stream harness

6d11e48

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Simplify cuDF harness README command

e53feae

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Avoid consuming CUDA partition before stream setup

7330751

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Improve cuDF harness runner output

d82937d

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Wait for cuDF harness release asset

52952d2

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Revert "Wait for cuDF harness release asset"

0722ada

This reverts commit 52952d2. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Defer cuDF stream harness opt-in

4ab3b2e

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

0ax1 added the changelog/feature A new feature label Jun 17, 2026

0ax1 added 4 commits June 17, 2026 21:28

Run cuDF device stream harness in CI

b2b0cbd

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Clarify CUDA stream ownership

a9970b3

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Rename CUDA partition device stream FFI

0b5fea3

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Document CUDA stream helpers

3ae0d00

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat[gpu]: arrow device array stream support#8483

feat[gpu]: arrow device array stream support#8483
0ax1 wants to merge 13 commits into
developfrom
ad/arrow-device-array-stream

0ax1 commented Jun 17, 2026

Uh oh!

0ax1 commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0ax1 commented Jun 17, 2026

Uh oh!

0ax1 commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review — Arrow Device Array Stream support

Findings

1. futures::executor::block_on won't drive a Tokio stream (medium)

2. Schema is committed before device validation in export_batch (low)

3. Empty-stream schema can diverge from a non-empty stream's schema (low / worth a comment)

4. set_error silently drops the message on the (now-impossible) NUL path (nit)

5. Thread affinity of the exported stream (question / doc)

Things I verified and liked

Checks I did not run

Uh oh!

codspeed-hq Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 15.43%

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 17, 2026 •

edited

Loading

1. `futures::executor::block_on` won't drive a Tokio stream (medium)

2. Schema is committed before device validation in `export_batch` (low)

4. `set_error` silently drops the message on the (now-impossible) NUL path (nit)

codspeed-hq Bot commented Jun 17, 2026 •

edited

Loading