Skip to content

[GH-2809] Support distance joins for raster predicates#2980

Draft
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:feature/raster-distance-join
Draft

[GH-2809] Support distance joins for raster predicates#2980
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:feature/raster-distance-join

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented May 20, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Adds an RS_DWithin(left, right, distance) predicate so distance joins can use raster operands, and routes the join planner through the same isGeography = true spatial-index path that ST_DistanceSphere already uses.

Unit semantics. distance is always meters. Both sides are unconditionally projected to WGS84 before the test (no CRS-native fast path), and the per-row predicate compares the spheroidal centroid distance against the threshold — matching ST_DistanceSpheroid/ST_DWithin(useSpheroid = true). This keeps the index filter and the row-level check on a single unit and avoids the silent CRS-dependent unit changes that the convex-hull fast path would otherwise introduce.

Planning.

  • New RS_DWithin SQL function with three overloads (raster + geom, geom + raster, raster + raster), backed by RasterPredicates.rsDWithin (always-WGS84 + Spheroid.distance).
  • JoinQueryDetector and OptimizableJoinCondition treat RS_DWithin as a distance-join predicate. Broadcast plans go through BroadcastIndexJoinExec; non-broadcast plans go through DistanceJoinExec.
  • TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD (build side) and the corresponding raster branch in BroadcastIndexJoinExec.createStreamShapes (stream side) project each row to a WGS84 envelope and expand by distance meters via JoinedGeometry.geometryToExpandedEnvelope(env, distance, isGeography = true) — the same Haversine polar-radius envelope expansion ST_DistanceSphere uses. Because the predicate's unit is now meters everywhere, the R-tree filter is correctly sized regardless of input CRS (no more degenerate all-to-all behaviour for same-CRS projected raster inputs).
  • DistanceJoinExec detects isRasterPredicate from operand dataType and routes the build / stream sides through the same WGS84 + Haversine helpers.
  • The placeholder UnsupportedOperationException for distance + raster is removed. Geography + raster + distance remains guarded — the geography refiner doesn't handle raster shapes.

How was this patch tested?

  • BroadcastIndexJoinSuite: new Passed RS_DWithin test exercises stream-raster, broadcast-raster, and swapped-operand forms (with a meter-scale threshold sized to match the global test raster).
  • RasterJoinSuite: new RS_DWithin distance join describe block covers DistanceJoinExec with both partition-side configs, swapped operands, and raster-raster.
  • All 122 tests across the two suites pass locally under -Dspark=3.4 -Pscala2.12.

Did this PR include necessary documentation updates?

  • Yes, I am adding a new API. I am using the current SNAPSHOT version number v1.9.1 in the Since field.
  • Yes, I have updated the documentation:
    • New docs/api/sql/Raster-Predicates/RS_DWithin.md (intro, CRS rules, all three signatures, meter-based SQL example, join-planning note).
    • Raster-Functions.md: predicate table row for RS_DWithin, including the "meters, projected to WGS84" semantics.
    • Optimizer.md: new "Raster distance join" subsection documenting the Haversine envelope expansion and spheroidal refinement, with broadcast and non-broadcast SQL examples in meters.

@jiayuasu jiayuasu force-pushed the feature/raster-distance-join branch from 53602f9 to 1e23c81 Compare May 22, 2026 06:25
@jiayuasu jiayuasu marked this pull request as draft May 22, 2026 07:05
Add `RS_DWithin(raster|geom, raster|geom, distance)` so distance joins
can use raster operands, and route the join planner through the existing
spatial-index machinery.

- `RS_DWithin` expression in `RasterPredicates.scala`, backed by new
  `RasterPredicates.rsDWithin` overloads (raster-geom, raster-raster)
  that reuse `convertCRSIfNeeded` and JTS `isWithinDistance`.
- `JoinQueryDetector` and `OptimizableJoinCondition` recognise
  `RS_DWithin` as a distance-join predicate; the relationship label
  collapses to `RS_DWithin` for all raster + distance cases.
- `BroadcastIndexJoinExec.createStreamShapes` and the new
  `TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD` handle the raster
  stream and build sides for broadcast-index joins; `SpatialIndexExec`
  and `DistanceJoinExec` route to the same helper so non-broadcast
  distance joins work too.
- Drop the placeholder `UnsupportedOperationException` guards for
  distance + raster combinations; geography + raster + distance remains
  guarded since the geography refiner does not handle raster shapes.

Tests
- `BroadcastIndexJoinSuite`: `RS_DWithin` covers stream-raster /
  broadcast-raster / swapped-operand forms.
- `RasterJoinSuite`: new `RS_DWithin distance join` describe block
  covers `DistanceJoinExec` with both partition-side configs, swapped
  operands, and raster-raster.

Docs
- New `docs/api/sql/Raster-Predicates/RS_DWithin.md` page.
- `Raster-Functions.md` predicate table row.
- `Optimizer.md` raster-distance-join subsection.
@jiayuasu jiayuasu force-pushed the feature/raster-distance-join branch from 1e23c81 to 3a95f0b Compare May 29, 2026 07:02
@jiayuasu jiayuasu requested a review from Copilot May 29, 2026 07:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds RS_DWithin support for raster distance predicates and routes raster distance joins through optimized broadcast and partitioned join paths using WGS84 envelope expansion and spheroidal refinement.

Changes:

  • Adds RS_DWithin raster/geometry and raster/raster predicate implementation, Spark expression, catalog registration, and docs.
  • Updates join detection/planning/execution to support raster distance joins via BroadcastIndexJoinExec and DistanceJoinExec.
  • Adds broadcast and non-broadcast join tests for raster distance predicates.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
common/src/main/java/org/apache/sedona/common/raster/RasterPredicates.java Adds WGS84 spheroidal rsDWithin raster predicate helpers.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/RasterPredicates.scala Adds Catalyst RS_DWithin expression.
spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala Registers RS_DWithin.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala Detects raster distance join predicates.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/OptimizableJoinCondition.scala Marks RS_DWithin as optimizable.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryBase.scala Adds expanded WGS84 envelope RDD helper.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/SpatialIndexExec.scala Builds raster distance indexes with expanded WGS84 envelopes.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/BroadcastIndexJoinExec.scala Handles raster stream-side distance shapes for broadcast joins.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/DistanceJoinExec.scala Routes raster distance joins through WGS84 envelope RDDs.
spark/common/src/test/scala/org/apache/sedona/sql/RasterJoinSuite.scala Adds partitioned raster distance join tests.
spark/common/src/test/scala/org/apache/sedona/sql/BroadcastIndexJoinSuite.scala Adds broadcast raster distance join tests.
docs/api/sql/Raster-Predicates/RS_DWithin.md Documents new SQL predicate.
docs/api/sql/Raster-Functions.md Adds predicate table entry.
docs/api/sql/Optimizer.md Documents raster distance join planning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +160 to +163
* Distance predicate for rasters: `RS_DWithin(left, right, distance)`. `left` and `right` can
* each be a raster or a geometry (at least one must be a raster). Returns true when the shapes
* are within `distance` of each other, with both sides projected to a common CRS prior to the JTS
* distance check (mirroring [[RS_Intersects]]). This expression is recognised by
Comment on lines +94 to +96
public static boolean rsDWithin(GridCoverage2D raster, Geometry geometry, double distance) {
Pair<Geometry, Geometry> geometries = toWGS84Pair(raster, geometry);
return Spheroid.distance(geometries.getLeft(), geometries.getRight()) <= distance;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support distance joins for raster predicates

2 participants