Skip to content

Inline Kleene boolean kernels#8478

Open
gatesn wants to merge 2 commits into
developfrom
ngates/kleene-bool-kernels
Open

Inline Kleene boolean kernels#8478
gatesn wants to merge 2 commits into
developfrom
ngates/kleene-bool-kernels

Conversation

@gatesn

@gatesn gatesn commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Next in our list of Arrow dependencies to remove

Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the changelog/performance A performance improvement label Jun 17, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 17, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 17.24%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 25 improved benchmarks
❌ 14 regressed benchmarks
✅ 1526 untouched benchmarks
⏩ 17 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation slice_empty_vortex 368.3 ns 2,657.8 ns -86.14%
Simulation chunked_bool_canonical_into[(1000, 10)] 20.3 µs 35 µs -42.04%
Simulation slice_vortex_buffer[1024] 871.4 ns 1,335 ns -34.73%
Simulation slice_vortex_buffer[16384] 871.4 ns 1,335 ns -34.73%
Simulation slice_vortex_buffer[2048] 871.4 ns 1,335 ns -34.73%
Simulation slice_vortex_buffer[128] 871.4 ns 1,335 ns -34.73%
Simulation slice_vortex_buffer[65536] 871.4 ns 1,335 ns -34.73%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 162.2 µs 198.2 µs -18.15%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 177.7 µs 213.9 µs -16.96%
Simulation search_index_below_min_chunked 1.3 ms 1.5 ms -13.6%
Simulation search_index_mixed_out_of_range_chunked 1.3 ms 1.5 ms -13.3%
Simulation count_i32_clustered_nulls 47 µs 53.7 µs -12.47%
Simulation search_index_full_range_random_chunked 1.4 ms 1.6 ms -11.99%
Simulation chunked_varbinview_canonical_into[(100, 100)] 273.2 µs 308.1 µs -11.33%
Simulation or_false_constant 69.5 µs 15.4 µs ×4.5
Simulation and_true_constant 69 µs 15.4 µs ×4.5
Simulation or_true_constant 69.1 µs 16.6 µs ×4.2
Simulation and_false_constant 69.2 µs 18 µs ×3.8
Simulation or_bool_constant 68.6 µs 34 µs ×2
Simulation or_null_constant 91.9 µs 51.4 µs +78.74%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ngates/kleene-bool-kernels (be5e9d4) with develop (69ce1ed)2

Open in CodSpeed

Footnotes

  1. 17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on develop (cb82828) during the generation of this report, so 69ce1ed was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@gatesn gatesn marked this pull request as ready for review June 17, 2026 16:18
@gatesn gatesn requested a review from a team June 17, 2026 16:18
Comment on lines +457 to +466
fn word_source_from_bit_buffer(buffer: &BitBuffer, len: usize) -> Option<WordSource<'_>> {
if !buffer.offset().is_multiple_of(8) {
return None;
}

let n_bytes = len.div_ceil(8);
let start = buffer.offset() / 8;
let end = start + n_bytes;
Some(WordSource::Bytes(&buffer.inner().as_slice()[start..end]))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mask can do this

Comment on lines +476 to +481
fn read_u64_le(bytes: &[u8]) -> u64 {
debug_assert!(bytes.len() <= 8);
let mut buf = [0; 8];
buf[..bytes.len()].copy_from_slice(bytes);
u64::from_le_bytes(buf)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this exists

}
}

fn mask_to_validity(mask: Mask, nullability: Nullability) -> Validity {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this surely exists already?

lhs.dtype().nullability() | rhs.dtype().nullability()
}

fn all_validity(nullability: Nullability) -> Validity {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also exist

@joseph-isaacs

Copy link
Copy Markdown
Contributor

Could the perf regression be due to alignment from this new boolean buffer allocated here?

use crate::scalar_fn::fns::operators::Operator;
use crate::validity::Validity;

/// Trait for encoding-specific Kleene boolean kernels that operate in encoded space.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only Kleene? Should we do both?

@joseph-isaacs

joseph-isaacs commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Is there list a?

Do we have a grasp of the performance of this change (ignoring constant), which would likely a mistake on our part previously?

I what would we expect here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants