Skip to content

Migrate UnKnownColumn proto hooks#22464

Open
koopatroopa787 wants to merge 6 commits into
apache:mainfrom
koopatroopa787:fix/unknown-column-proto-hooks
Open

Migrate UnKnownColumn proto hooks#22464
koopatroopa787 wants to merge 6 commits into
apache:mainfrom
koopatroopa787:fix/unknown-column-proto-hooks

Conversation

@koopatroopa787
Copy link
Copy Markdown

@koopatroopa787 koopatroopa787 commented May 22, 2026

Which issue does this PR close?

Rationale for this change

This issue is part of the migration to the new proto hooks ( ry_to_proto / ry_from_proto). UnKnownColumn was still handled in the legacy match arms, so this ports it to the hook-based path for consistency with other migrated expressions (e.g. Column, BinaryExpr).

What changes are included in this PR?

  • Add ry_to_proto / ry_from_proto implementations for UnKnownColumn in physical-expr.
  • Remove the legacy UnKnownColumn match arms in datafusion/proto physical plan conversion.

Are these changes tested?

  • cargo test -p datafusion-proto --lib

Are there any user-facing changes?

No user-facing changes.

Copilot AI review requested due to automatic review settings May 22, 2026 17:03
@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates proto Related to proto crate labels May 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Moves protobuf (de)serialization for UnKnownColumn out of the generic physical-plan proto glue and into the expression type itself (proto feature-gated), aligning with the “each expr owns its own proto” direction.

Changes:

  • Remove UnKnownColumn special-casing from serialize_physical_expr_with_converter in to_proto.rs.
  • Route UnknownColumn decoding through UnKnownColumn::try_from_proto in from_proto.rs.
  • Implement try_to_proto / try_from_proto for UnKnownColumn behind the proto feature.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
datafusion/proto/src/physical_plan/to_proto.rs Removes inlined UnKnownColumn serialization path.
datafusion/proto/src/physical_plan/from_proto.rs Switches decoding to UnKnownColumn::try_from_proto.
datafusion/physical-expr/src/expressions/unknown_column.rs Adds proto encode/decode implementations for UnKnownColumn.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +96 to +103
Ok(Some(protobuf::PhysicalExprNode {
expr_id: None,
expr_type: Some(protobuf::physical_expr_node::ExprType::UnknownColumn(
protobuf::UnknownColumn {
name: self.name.clone(),
},
)),
}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the previous inlined serialization path in to_proto.rs populated expr_id from the serializer’s expr_id

I believe this variable was an option and was None since only dynamic filters set expr_id

Comment on lines +114 to +117
let protobuf::UnknownColumn { name } = match &node.expr_type {
Some(protobuf::physical_expr_node::ExprType::UnknownColumn(c)) => c,
_ => return internal_err!("PhysicalExprNode is not an UnKnownColumn"),
};
@kumarUjjawal
Copy link
Copy Markdown
Contributor

Hi @koopatroopa787 Thank you for picking this up. Can you please update the PR body with more details. Use the pr template for this.

@koopatroopa787
Copy link
Copy Markdown
Author

Hi @koopatroopa787 Thank you for picking this up. Can you please update the PR body with more details. Use the pr template for this.

Hii @kumarUjjawal, updated based on the PR template, please have a look and let me know if any changes needed.

Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

use datafusion_common::{Result, internal_err};

#[cfg(feature = "proto")]
use datafusion_proto_models::protobuf;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can move this use statement into the try_to_proto and try_from_proto methods

Kanishk Sachan and others added 2 commits May 24, 2026 03:05
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Kanishk Sachan <koopatroopa787>
@koopatroopa787 koopatroopa787 force-pushed the fix/unknown-column-proto-hooks branch from c6b0b8b to bfa248a Compare May 24, 2026 02:05
@koopatroopa787
Copy link
Copy Markdown
Author

koopatroopa787 commented May 24, 2026

@jayshrivastava Thanks — I updated the decode path to include the actual xpr_type and xpr_id in the error, so mismatched proto nodes are easier to diagnose. I also moved the proto import into the ry_to_proto / ry_from_proto methods as suggested, and rebased the branch onto the latest main.

@kumarUjjawal
Copy link
Copy Markdown
Contributor

@koopatroopa787 we should add a datafusion-proto roundtrip test that serializes and deserializes a partitioning or plan containing UnKnownColumn.

…ems-after-test-module lint

Closes: address clippy failure in CI

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@koopatroopa787
Copy link
Copy Markdown
Author

Moved the #[cfg(test)] mod tests to the end of unknown_column.rs so impl Hash/PartialEq come first (resolves clippy 'items-after-test-module'). Ran cargo clippy locally and the lint is fixed. Pushed the change to fix/unknown-column-proto-hooks. Next I'll add the requested datafusion-proto roundtrip test and re-run CI; let me know if you'd prefer using #[allow(clippy::items_after_test_module)] instead.

…equality in binary statistics test

Closes: address requested roundtrip test and stabilize failing test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@koopatroopa787
Copy link
Copy Markdown
Author

Added a proto roundtrip unit test for UnKnownColumn (unknown_column_proto_roundtrip) and a small stabilization fix for a flaky statistics test (compareBernoulli p with tolerance). Ran cargo test and cargo clippy locally; all checks for datafusion-physical-expr pass. Pushed updates to fix/unknown-column-proto-hooks.

@koopatroopa787
Copy link
Copy Markdown
Author

@kumarUjjawal Addressed. I added unknown_column_proto_roundtrip to verify deserialize behavior for UnKnownColumn via the new proto hook path, and pushed it on fix/unknown-column-proto-hooks (commit 9e6fce9). I also fixed the clippy failure (items-after-test-module) by moving the test module after impls. Local cargo clippy -p datafusion-physical-expr -- -D warnings and cargo test -p datafusion-physical-expr --lib --features proto pass.

@koopatroopa787
Copy link
Copy Markdown
Author

All CI checks are passing and the PR has been approved by @jayshrivastava. Could a committer please take a look when you get a chance? Happy to make any changes needed.

@kumarUjjawal
Copy link
Copy Markdown
Contributor

I think the "roundtrip" test doesn't round-trip. It builds a proto node by hand and only calls try_from_proto. It never calls try_to_proto and never goes through the proto crate's central serialize/parse path?

Comment on lines +148 to +154
use std::sync::Arc;

#[cfg(feature = "proto")]
use arrow::datatypes::Schema;

#[cfg(feature = "proto")]
use datafusion_common::{Result, internal_err};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need these re-imports since we are already using super?

Distribution::new_bernoulli(ScalarValue::from(15.0 / 16.0))?
);
let got = neq.evaluate_statistics(&[left_stat, right_stat])?;
let expected = Distribution::new_bernoulli(ScalarValue::from(15.0 / 16.0))?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port UnKnownColumn to use try_to_proto / try_from_proto

4 participants