fix: replace with empty search string should be a no-op#22497
Open
Amogh-2404 wants to merge 1 commit into
Open
fix: replace with empty search string should be a no-op#22497Amogh-2404 wants to merge 1 commit into
Amogh-2404 wants to merge 1 commit into
Conversation
PostgreSQL returns the input unchanged when `replace` is called with an empty `from`. DataFusion was instead inserting `to` before every character and at both ends - so `replace('abc', '', 'x')` returned `xaxbxcx`. Fix the empty-`from` branch in `apply_replace` to write the input verbatim. Update the regression tests in `string_query.slt.part` that were asserting the buggy output.
Closes apache#22253
Signed-off-by: Amogh Ramesh <ramogh2404@gmail.com>
Contributor
|
LGTM, thanks! |
Contributor
|
We have an existing PR for this issue: |
Author
Thanks @Jefffrey — I missed #22357 when I picked this up, apologies for the duplicate. Happy to close this in favour of it. |
Contributor
|
I suppose if #22357 doesn't become active again we can proceed with this PR instead |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
replacewith an empty search string should be a no-op #22253.Rationale for this change
PostgreSQL returns the input unchanged when
replaceis called with an emptyfrom. DataFusion was instead insertingtobefore every character and at both ends, soreplace('abc', '', 'x')returnedxaxbxcx. This PR brings the behaviour in line with PostgreSQL. Part of the PG-compatibility cleanup tracked in #22247.What changes are included in this PR?
datafusion/functions/src/string/replace.rs: the empty-frombranch inapply_replacenow writes the input verbatim instead of insertingto. Added aLargeUtf8unit test for the new behaviour.datafusion/sqllogictest/test_files/string/string_literal.slt: four new SLT asserts covering theUtf8,Dictionary,Utf8View, andLargeUtf8paths.datafusion/sqllogictest/test_files/string/string_query.slt.part: updated four expected rows that were asserting the old buggy output.Are these changes tested?
Yes. The unit test in
replace.rscovers theLargeUtf8path, and the four new SLT asserts instring_literal.sltcover the remaining Arrow string encodings end-to-end. The full SLT suite passes locally.Are there any user-facing changes?
Yes.
replace(str, '', x)now returnsstrunchanged instead of insertingxbetween every character. This matches PostgreSQL.