Skip to content

fix: widen power(decimal, float) to Float64, fix bugs#22482

Open
neilconway wants to merge 2 commits into
apache:mainfrom
neilconway:neilc/fix-power-decimal-float
Open

fix: widen power(decimal, float) to Float64, fix bugs#22482
neilconway wants to merge 2 commits into
apache:mainfrom
neilconway:neilc/fix-power-decimal-float

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented May 23, 2026

Which issue does this PR close?

Rationale for this change

This PR makes several improvements to the power UDF:

  1. Previously, power(decimal, float) returned decimal, with the same precision as the decimal argument. This resulted in silently truncating the results in ways that most users wouldn't expect; for example, power(2.5::decimal(2, 1), 4.0) returned 39. Instead, this PR changes power(decimal, float) to return a Float64 instead.

  2. simplify for power could sometimes have resulted in mismatches between the declared return type of the function and the simplified expression. Change this to insert casts instead.

  3. Previously, power(decimal, int-array) converted its inputs to Float64, on the argument that this improved performance. Empirically, this does not seem to be true any longer (perhaps it was true with older version of Arrow). Perhaps more importantly, converting exact numeric types to floating point is undesirable because it loses precision. The behavior here was also inconsistent with the behavior for power(decimal, int-scalar).

Benchmarks (Arm64):

case                                main (ns)  branch (ns)          Δ
----                                ---------  -----------        ---
array n=1024 exp=2                       8546         4753     -44.3%
array n=1024 exp=4                       8516         5839     -31.4%
array n=1024 exp=8                       8458         6298     -25.5%
array n=8192 exp=2                      65197        37159     -43.0%
array n=8192 exp=4                      65136        44680     -31.4%
array n=8192 exp=8                      65110        49479     -24.0%
scalar n=1024 exp=2                      5281         5025      -4.8%
scalar n=1024 exp=4                      6473         5972      -7.7%
scalar n=1024 exp=8                      6700         6593      -1.6%
scalar n=8192 exp=2                     40280        38481      -4.4%
scalar n=8192 exp=4                     49497        45748      -7.5%
scalar n=8192 exp=8                     51334        51450      +0.2%

What changes are included in this PR?

  • Implement fixes/improvements described above.
  • Various refactoring and code cleanup
  • Add new benchmark
  • Update SLT tests

Are these changes tested?

Yes; new tests added.

Are there any user-facing changes?

Yes; this commit changes the behavior of power with a decimal base. In most cases this is a clear improvement; one slight regression is that the decimal code path for power is more likely to needlessly overflow (#22480); I will fix that in a followup PR.

@github-actions github-actions Bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 23, 2026
@Jefffrey
Copy link
Copy Markdown
Contributor

cc @theirix since worked on the decimal part for this before

@Omega359
Copy link
Copy Markdown
Contributor

# power scalar null result types
query TTT
select arrow_typeof(power(null, 64)),
       arrow_typeof(power(null, null)),
       arrow_typeof(power(2, null));
----
Float64 Float64 Float64

passes in main, fails in PR.

3. query result mismatch:
[SQL] select arrow_typeof(power(null, 64)),
       arrow_typeof(power(null, null)),
       arrow_typeof(power(2, null));
[Diff] (-expected|+actual)
-   Float64 Float64 Float64
+   Null Null Float64
at /home/bruce/dev/datafusion/datafusion/sqllogictest/test_files/scalar.slt:870

@Omega359
Copy link
Copy Markdown
Contributor

Please fix the slt tests that are failing - math.slt, expr.slt, push_down_filter_regression.slt

@neilconway neilconway force-pushed the neilc/fix-power-decimal-float branch from 783cef6 to 6b3e7cc Compare May 24, 2026 14:29
@neilconway
Copy link
Copy Markdown
Contributor Author

@Omega359 thanks, fixed.

@theirix
Copy link
Copy Markdown
Contributor

theirix commented May 24, 2026

Thanks, this is a real regression. The main question is whether we should use a decimal or a float output type.
From a quick skim, analytical engines use float output types for the decimal inputs. The major exception is Postgres - it preserves decimal types, as we previously did for power. Also, with parse_float_as_decimal, we promote the second argument to decimal - would it work?

@neilconway
Copy link
Copy Markdown
Contributor Author

The main question is whether we should use a decimal or a float output type.
From a quick skim, analytical engines use float output types for the decimal inputs. The major exception is Postgres - it preserves decimal types

numeric/decimal in Postgres is arbitrary precision and can represent exceptional values like NaN, so I think it's somewhat different (no risk of overflow, etc.)

We could follow DuckDB and others by always returning a float output type, but to me it seems unfortunate to lose exact precision for the power(decimal, integer) case.

Also, with parse_float_as_decimal, we promote the second argument to decimal - would it work?

AFAIK this PR should work fine in that scenario (I'm working on this in part because I want to flip parse_float_as_decimal to true by default in the future). But lmk if I misunderstood your question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

power(decimal, float) should return float

4 participants