fix(layout/dict): probe with the configured compressor instead of a hardcoded default#8406
fix(layout/dict): probe with the configured compressor instead of a hardcoded default#8406tomsanbear wants to merge 4 commits into
Conversation
…ardcoded default DictStrategy decides whether to apply a dictionary layout by probe-compressing the first chunk and checking whether the cascade chose Dict. The probe was hardcoded to BtrBlocksCompressor::default(), ignoring the compressor configured through WriteStrategyBuilder, so a caller's scheme choices did not influence the dict-fit decision. Add a probe_compressor field to DictStrategy (defaulting to BtrBlocksCompressor::default(), leaving existing callers unchanged) with a with_probe_compressor setter, and have WriteStrategyBuilder::build pass the stats_compressor. stats_compressor is used rather than data_compressor because data_compressor excludes IntDictScheme to avoid re-encoding the dict codes; the probe needs every dict scheme to detect eligibility. For the default builder stats_compressor equals BtrBlocksCompressor::default(), so the default path is unchanged. Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
|
Open design question: With this change, the probe runs the caller's opaque compressor instead of the stock default. This is empirically a no-op for every current opaque caller (all pass btrblocks-based compressors that emit Dict, or non-dict-favourable vector data). The only behaviour change would be for a hypothetical opaque compressor that never emits a top-level |
Merging this PR will degrade performance by 11.51%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | bitwise_not_vortex_buffer_mut[128] |
215.3 ns | 244.4 ns | -11.93% |
| ❌ | Simulation | encode_varbin[(1000, 2)] |
157.1 µs | 176.7 µs | -11.09% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing tomsanbear:fix/dict-strategy-probe-compressor (f15df5d) with develop (d737639)
| let compress_then_flat = CompressingStrategy::new(flat, Arc::clone(&stats_compressor)); | ||
|
|
||
| // 3. apply dict encoding or fallback | ||
| // 3. apply dict encoding or fallback. | ||
| // The dict-fit probe shares `stats_compressor` (the full configured cascade), not | ||
| // `data_compressor`: `data_compressor` drops `IntDictScheme` to avoid re-encoding the | ||
| // codes produced in step 5, but the probe needs every dict scheme to detect eligibility. | ||
| // The full cascade also honours caller scheme exclusions, unlike a hardcoded default. | ||
| let dict = DictStrategy::new( | ||
| coalescing.clone(), | ||
| compress_then_flat.clone(), | ||
| coalescing, | ||
| Default::default(), | ||
| ); | ||
| ) | ||
| .with_probe_compressor(stats_compressor); |
There was a problem hiding this comment.
Why use this compressor?
There was a problem hiding this comment.
The probe compresses the first chunk and only checks is dict, the result is discarded. stats_compressor seems like the right fit since it has all dictionary schemes; the data compressor excludes intdictscheme to prevent double dict-encoding the codes that DictStrategy produces downstream
There was a problem hiding this comment.
Maybe we can have a new compressor argument thar defaults to btrblocks instead of reusing the stats ine. I agree using data compressor doesn't make sense but maybe callers want a separate compressor for dict selection than the one used for stats
There was a problem hiding this comment.
I guess it's the tradeoff that callers have to explicitly choose which compressor probes dict eligibility, but that seems better than a default that silently ignores configured scheme exclusions, what do you think?
There was a problem hiding this comment.
Yea if users want to customise the probe compressor they should be able to change only that while constructing their write strategies. For the default case having the stats compressor works but there would probably be a need to change the probe compressor separately from the stats compressor
There was a problem hiding this comment.
Got it, that makes sense, I can make this change. I'll add a new with_probe_compressor and wire it through
onursatici
left a comment
There was a problem hiding this comment.
this makes sense to me
…mment Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
…ilder Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
Summary
Closes: #8405
DictStrategy's dict-fit probe was hardcoded toBtrBlocksCompressor::default(), ignoring the caller's configured compressor. The probe now takesstats_compressoras a parameter toDictStrategy::new, so caller scheme exclusions are honoured.stats_compressorrather thandata_compressorbecause the probe needs all dictionary schemes to detect eligibility.data_compressorexcludesIntDictScheme.Testing
New test
dict_probe_honours_configured_compressor: asserts dict layout with the default builder, asserts no dict layout withStringDictSchemeexcluded.AI use disclosure: fix authored with assistance from Claude Code.