Inline Kleene boolean kernels#8478
Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Merging this PR will improve performance by 17.24%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | slice_empty_vortex |
368.3 ns | 2,657.8 ns | -86.14% |
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
20.3 µs | 35 µs | -42.04% |
| ❌ | Simulation | slice_vortex_buffer[1024] |
871.4 ns | 1,335 ns | -34.73% |
| ❌ | Simulation | slice_vortex_buffer[16384] |
871.4 ns | 1,335 ns | -34.73% |
| ❌ | Simulation | slice_vortex_buffer[2048] |
871.4 ns | 1,335 ns | -34.73% |
| ❌ | Simulation | slice_vortex_buffer[128] |
871.4 ns | 1,335 ns | -34.73% |
| ❌ | Simulation | slice_vortex_buffer[65536] |
871.4 ns | 1,335 ns | -34.73% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
162.2 µs | 198.2 µs | -18.15% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
177.7 µs | 213.9 µs | -16.96% |
| ❌ | Simulation | search_index_below_min_chunked |
1.3 ms | 1.5 ms | -13.6% |
| ❌ | Simulation | search_index_mixed_out_of_range_chunked |
1.3 ms | 1.5 ms | -13.3% |
| ❌ | Simulation | count_i32_clustered_nulls |
47 µs | 53.7 µs | -12.47% |
| ❌ | Simulation | search_index_full_range_random_chunked |
1.4 ms | 1.6 ms | -11.99% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
273.2 µs | 308.1 µs | -11.33% |
| ⚡ | Simulation | or_false_constant |
69.5 µs | 15.4 µs | ×4.5 |
| ⚡ | Simulation | and_true_constant |
69 µs | 15.4 µs | ×4.5 |
| ⚡ | Simulation | or_true_constant |
69.1 µs | 16.6 µs | ×4.2 |
| ⚡ | Simulation | and_false_constant |
69.2 µs | 18 µs | ×3.8 |
| ⚡ | Simulation | or_bool_constant |
68.6 µs | 34 µs | ×2 |
| ⚡ | Simulation | or_null_constant |
91.9 µs | 51.4 µs | +78.74% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/kleene-bool-kernels (be5e9d4) with develop (69ce1ed)2
Footnotes
-
17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
develop(cb82828) during the generation of this report, so 69ce1ed was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
| fn word_source_from_bit_buffer(buffer: &BitBuffer, len: usize) -> Option<WordSource<'_>> { | ||
| if !buffer.offset().is_multiple_of(8) { | ||
| return None; | ||
| } | ||
|
|
||
| let n_bytes = len.div_ceil(8); | ||
| let start = buffer.offset() / 8; | ||
| let end = start + n_bytes; | ||
| Some(WordSource::Bytes(&buffer.inner().as_slice()[start..end])) | ||
| } |
| fn read_u64_le(bytes: &[u8]) -> u64 { | ||
| debug_assert!(bytes.len() <= 8); | ||
| let mut buf = [0; 8]; | ||
| buf[..bytes.len()].copy_from_slice(bytes); | ||
| u64::from_le_bytes(buf) | ||
| } |
| } | ||
| } | ||
|
|
||
| fn mask_to_validity(mask: Mask, nullability: Nullability) -> Validity { |
There was a problem hiding this comment.
this surely exists already?
| lhs.dtype().nullability() | rhs.dtype().nullability() | ||
| } | ||
|
|
||
| fn all_validity(nullability: Nullability) -> Validity { |
|
Could the perf regression be due to alignment from this new boolean buffer allocated here? |
| use crate::scalar_fn::fns::operators::Operator; | ||
| use crate::validity::Validity; | ||
|
|
||
| /// Trait for encoding-specific Kleene boolean kernels that operate in encoded space. |
There was a problem hiding this comment.
Why only Kleene? Should we do both?
|
Is there list a? Do we have a grasp of the performance of this change (ignoring constant), which would likely a mistake on our part previously? I what would we expect here? |
Next in our list of Arrow dependencies to remove