[GH-2809] Support distance joins for raster predicates#2980
Draft
jiayuasu wants to merge 1 commit into
Draft
Conversation
53602f9 to
1e23c81
Compare
Add `RS_DWithin(raster|geom, raster|geom, distance)` so distance joins can use raster operands, and route the join planner through the existing spatial-index machinery. - `RS_DWithin` expression in `RasterPredicates.scala`, backed by new `RasterPredicates.rsDWithin` overloads (raster-geom, raster-raster) that reuse `convertCRSIfNeeded` and JTS `isWithinDistance`. - `JoinQueryDetector` and `OptimizableJoinCondition` recognise `RS_DWithin` as a distance-join predicate; the relationship label collapses to `RS_DWithin` for all raster + distance cases. - `BroadcastIndexJoinExec.createStreamShapes` and the new `TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD` handle the raster stream and build sides for broadcast-index joins; `SpatialIndexExec` and `DistanceJoinExec` route to the same helper so non-broadcast distance joins work too. - Drop the placeholder `UnsupportedOperationException` guards for distance + raster combinations; geography + raster + distance remains guarded since the geography refiner does not handle raster shapes. Tests - `BroadcastIndexJoinSuite`: `RS_DWithin` covers stream-raster / broadcast-raster / swapped-operand forms. - `RasterJoinSuite`: new `RS_DWithin distance join` describe block covers `DistanceJoinExec` with both partition-side configs, swapped operands, and raster-raster. Docs - New `docs/api/sql/Raster-Predicates/RS_DWithin.md` page. - `Raster-Functions.md` predicate table row. - `Optimizer.md` raster-distance-join subsection.
1e23c81 to
3a95f0b
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds RS_DWithin support for raster distance predicates and routes raster distance joins through optimized broadcast and partitioned join paths using WGS84 envelope expansion and spheroidal refinement.
Changes:
- Adds
RS_DWithinraster/geometry and raster/raster predicate implementation, Spark expression, catalog registration, and docs. - Updates join detection/planning/execution to support raster distance joins via
BroadcastIndexJoinExecandDistanceJoinExec. - Adds broadcast and non-broadcast join tests for raster distance predicates.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
common/src/main/java/org/apache/sedona/common/raster/RasterPredicates.java |
Adds WGS84 spheroidal rsDWithin raster predicate helpers. |
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/RasterPredicates.scala |
Adds Catalyst RS_DWithin expression. |
spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala |
Registers RS_DWithin. |
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala |
Detects raster distance join predicates. |
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/OptimizableJoinCondition.scala |
Marks RS_DWithin as optimizable. |
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryBase.scala |
Adds expanded WGS84 envelope RDD helper. |
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/SpatialIndexExec.scala |
Builds raster distance indexes with expanded WGS84 envelopes. |
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/BroadcastIndexJoinExec.scala |
Handles raster stream-side distance shapes for broadcast joins. |
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/DistanceJoinExec.scala |
Routes raster distance joins through WGS84 envelope RDDs. |
spark/common/src/test/scala/org/apache/sedona/sql/RasterJoinSuite.scala |
Adds partitioned raster distance join tests. |
spark/common/src/test/scala/org/apache/sedona/sql/BroadcastIndexJoinSuite.scala |
Adds broadcast raster distance join tests. |
docs/api/sql/Raster-Predicates/RS_DWithin.md |
Documents new SQL predicate. |
docs/api/sql/Raster-Functions.md |
Adds predicate table entry. |
docs/api/sql/Optimizer.md |
Documents raster distance join planning. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+160
to
+163
| * Distance predicate for rasters: `RS_DWithin(left, right, distance)`. `left` and `right` can | ||
| * each be a raster or a geometry (at least one must be a raster). Returns true when the shapes | ||
| * are within `distance` of each other, with both sides projected to a common CRS prior to the JTS | ||
| * distance check (mirroring [[RS_Intersects]]). This expression is recognised by |
Comment on lines
+94
to
+96
| public static boolean rsDWithin(GridCoverage2D raster, Geometry geometry, double distance) { | ||
| Pair<Geometry, Geometry> geometries = toWGS84Pair(raster, geometry); | ||
| return Spheroid.distance(geometries.getLeft(), geometries.getRight()) <= distance; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Support distance joins for raster predicates #2809What changes were proposed in this PR?
Adds an
RS_DWithin(left, right, distance)predicate so distance joins can use raster operands, and routes the join planner through the sameisGeography = truespatial-index path thatST_DistanceSpherealready uses.Unit semantics.
distanceis always meters. Both sides are unconditionally projected to WGS84 before the test (no CRS-native fast path), and the per-row predicate compares the spheroidal centroid distance against the threshold — matchingST_DistanceSpheroid/ST_DWithin(useSpheroid = true). This keeps the index filter and the row-level check on a single unit and avoids the silent CRS-dependent unit changes that the convex-hull fast path would otherwise introduce.Planning.
RS_DWithinSQL function with three overloads (raster + geom,geom + raster,raster + raster), backed byRasterPredicates.rsDWithin(always-WGS84 +Spheroid.distance).JoinQueryDetectorandOptimizableJoinConditiontreatRS_DWithinas a distance-join predicate. Broadcast plans go throughBroadcastIndexJoinExec; non-broadcast plans go throughDistanceJoinExec.TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD(build side) and the corresponding raster branch inBroadcastIndexJoinExec.createStreamShapes(stream side) project each row to a WGS84 envelope and expand bydistancemeters viaJoinedGeometry.geometryToExpandedEnvelope(env, distance, isGeography = true)— the same Haversine polar-radius envelope expansionST_DistanceSphereuses. Because the predicate's unit is now meters everywhere, the R-tree filter is correctly sized regardless of input CRS (no more degenerate all-to-all behaviour for same-CRS projected raster inputs).DistanceJoinExecdetectsisRasterPredicatefrom operanddataTypeand routes the build / stream sides through the same WGS84 + Haversine helpers.UnsupportedOperationExceptionfor distance + raster is removed. Geography + raster + distance remains guarded — the geography refiner doesn't handle raster shapes.How was this patch tested?
BroadcastIndexJoinSuite: newPassed RS_DWithintest exercises stream-raster, broadcast-raster, and swapped-operand forms (with a meter-scale threshold sized to match the global test raster).RasterJoinSuite: newRS_DWithin distance joindescribe block coversDistanceJoinExecwith both partition-side configs, swapped operands, and raster-raster.-Dspark=3.4 -Pscala2.12.Did this PR include necessary documentation updates?
v1.9.1in theSincefield.docs/api/sql/Raster-Predicates/RS_DWithin.md(intro, CRS rules, all three signatures, meter-based SQL example, join-planning note).Raster-Functions.md: predicate table row forRS_DWithin, including the "meters, projected to WGS84" semantics.Optimizer.md: new "Raster distance join" subsection documenting the Haversine envelope expansion and spheroidal refinement, with broadcast and non-broadcast SQL examples in meters.