Migrate UnKnownColumn proto hooks#22464
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Moves protobuf (de)serialization for UnKnownColumn out of the generic physical-plan proto glue and into the expression type itself (proto feature-gated), aligning with the “each expr owns its own proto” direction.
Changes:
- Remove
UnKnownColumnspecial-casing fromserialize_physical_expr_with_converterinto_proto.rs. - Route
UnknownColumndecoding throughUnKnownColumn::try_from_protoinfrom_proto.rs. - Implement
try_to_proto/try_from_protoforUnKnownColumnbehind theprotofeature.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| datafusion/proto/src/physical_plan/to_proto.rs | Removes inlined UnKnownColumn serialization path. |
| datafusion/proto/src/physical_plan/from_proto.rs | Switches decoding to UnKnownColumn::try_from_proto. |
| datafusion/physical-expr/src/expressions/unknown_column.rs | Adds proto encode/decode implementations for UnKnownColumn. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Ok(Some(protobuf::PhysicalExprNode { | ||
| expr_id: None, | ||
| expr_type: Some(protobuf::physical_expr_node::ExprType::UnknownColumn( | ||
| protobuf::UnknownColumn { | ||
| name: self.name.clone(), | ||
| }, | ||
| )), | ||
| })) |
There was a problem hiding this comment.
while the previous inlined serialization path in to_proto.rs populated expr_id from the serializer’s expr_id
I believe this variable was an option and was None since only dynamic filters set expr_id
| let protobuf::UnknownColumn { name } = match &node.expr_type { | ||
| Some(protobuf::physical_expr_node::ExprType::UnknownColumn(c)) => c, | ||
| _ => return internal_err!("PhysicalExprNode is not an UnKnownColumn"), | ||
| }; |
|
Hi @koopatroopa787 Thank you for picking this up. Can you please update the PR body with more details. Use the pr template for this. |
Hii @kumarUjjawal, updated based on the PR template, please have a look and let me know if any changes needed. |
| use datafusion_common::{Result, internal_err}; | ||
|
|
||
| #[cfg(feature = "proto")] | ||
| use datafusion_proto_models::protobuf; |
There was a problem hiding this comment.
nit: you can move this use statement into the try_to_proto and try_from_proto methods
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Kanishk Sachan <koopatroopa787>
c6b0b8b to
bfa248a
Compare
|
@jayshrivastava Thanks — I updated the decode path to include the actual xpr_type and xpr_id in the error, so mismatched proto nodes are easier to diagnose. I also moved the proto import into the ry_to_proto / ry_from_proto methods as suggested, and rebased the branch onto the latest main. |
|
@koopatroopa787 we should add a |
…ems-after-test-module lint Closes: address clippy failure in CI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Moved the #[cfg(test)] mod tests to the end of unknown_column.rs so impl Hash/PartialEq come first (resolves clippy 'items-after-test-module'). Ran cargo clippy locally and the lint is fixed. Pushed the change to fix/unknown-column-proto-hooks. Next I'll add the requested datafusion-proto roundtrip test and re-run CI; let me know if you'd prefer using #[allow(clippy::items_after_test_module)] instead. |
…equality in binary statistics test Closes: address requested roundtrip test and stabilize failing test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Added a proto roundtrip unit test for UnKnownColumn (unknown_column_proto_roundtrip) and a small stabilization fix for a flaky statistics test (compareBernoulli p with tolerance). Ran cargo test and cargo clippy locally; all checks for datafusion-physical-expr pass. Pushed updates to fix/unknown-column-proto-hooks. |
|
@kumarUjjawal Addressed. I added unknown_column_proto_roundtrip to verify deserialize behavior for UnKnownColumn via the new proto hook path, and pushed it on fix/unknown-column-proto-hooks (commit 9e6fce9). I also fixed the clippy failure (items-after-test-module) by moving the test module after impls. Local cargo clippy -p datafusion-physical-expr -- -D warnings and cargo test -p datafusion-physical-expr --lib --features proto pass. |
|
All CI checks are passing and the PR has been approved by @jayshrivastava. Could a committer please take a look when you get a chance? Happy to make any changes needed. |
|
I think the "roundtrip" test doesn't round-trip. It builds a proto node by hand and only calls try_from_proto. It never calls try_to_proto and never goes through the proto crate's central serialize/parse path? |
| use std::sync::Arc; | ||
|
|
||
| #[cfg(feature = "proto")] | ||
| use arrow::datatypes::Schema; | ||
|
|
||
| #[cfg(feature = "proto")] | ||
| use datafusion_common::{Result, internal_err}; |
There was a problem hiding this comment.
do we need these re-imports since we are already using super?
| Distribution::new_bernoulli(ScalarValue::from(15.0 / 16.0))? | ||
| ); | ||
| let got = neq.evaluate_statistics(&[left_stat, right_stat])?; | ||
| let expected = Distribution::new_bernoulli(ScalarValue::from(15.0 / 16.0))?; |
There was a problem hiding this comment.
What is this test change?
Which issue does this PR close?
UnKnownColumnto usetry_to_proto/try_from_proto#22420.Rationale for this change
This issue is part of the migration to the new proto hooks ( ry_to_proto / ry_from_proto). UnKnownColumn was still handled in the legacy match arms, so this ports it to the hook-based path for consistency with other migrated expressions (e.g. Column, BinaryExpr).
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No user-facing changes.