Skip to content

feat: add array_scale scalar function#22466

Open
crm26 wants to merge 1 commit into
apache:mainfrom
crm26:feat/array-scale
Open

feat: add array_scale scalar function#22466
crm26 wants to merge 1 commit into
apache:mainfrom
crm26:feat/array-scale

Conversation

@crm26
Copy link
Copy Markdown
Contributor

@crm26 crm26 commented May 22, 2026

Which issue does this PR close?

Partial of #21536array_scale (the list+scalar arithmetic function in the vector math series).

Rationale for this change

Continues the per-function split requested by @alamb on #21536. Three sibling PRs already merged: cosine_distance (#21542), inner_product (#21861), array_normalize (#22013). array_add is in flight as #22459 by @SubhamSinghal.

Adds element-wise scalar multiplication for numeric arrays, returning a list of the same shape. Aliased as list_scale to match the array_X / list_X precedent in this crate.

What changes are included in this PR?

  • New scalar UDF array_scale(array, scalar) in datafusion/functions-nested/src/array_scale.rs
  • Module wire-up + registration in datafusion/functions-nested/src/lib.rs
  • SLT tests at datafusion/sqllogictest/test_files/array_scale.slt
  • Auto-generated function docs entry in docs/source/user-guide/sql/scalar_functions.md

Signature: first arg List/LargeList/FixedSizeList<numeric>, second arg numeric scalar. Both coerce to Float64. Same list-widening rules as the binary-op siblings.

NULL semantics:

  • NULL row in array → NULL row out
  • NULL scalar → NULL row out (whole-row, because the scalar applies uniformly)
  • NULL element at position `i` → NULL element at `i` out (per-element propagation)
  • Empty array → empty array

Builders: uses `OffsetBufferBuilder` + `NullBufferBuilder` per the pattern adopted in the round-1 review of #22013.

Are these changes tested?

Yes. `array_scale.slt` covers:

  • Happy paths (positive, negative, zero, fractional, single-element)
  • NULL propagation at all three levels (NULL row, NULL scalar, NULL element)
  • All list type variants (`List`, `LargeList`, `FixedSizeList`)
  • Numeric inner type coercion (Float32, Int64, integer literals)
  • Multi-row queries with both constant-scalar broadcast and per-row column scalar
  • Error paths (non-numeric scalar, non-list first arg, wrong arity)
  • Empty array
  • `list_scale` alias

Are there any user-facing changes?

Yes — new SQL scalar function `array_scale(array, scalar)` and its alias `list_scale`. Documented in `docs/source/user-guide/sql/scalar_functions.md`.

@github-actions github-actions Bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 22, 2026
Comment thread datafusion/functions-nested/src/array_scale.rs Outdated

let mut value_builder = Float64Array::builder(values.len());
let mut new_offsets = OffsetBufferBuilder::<O>::new(list_array.len());
let mut row_nulls = NullBufferBuilder::new(list_array.len());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to calculate per row nulls, can just use NullBuffer::union

let nulls = NullBuffer::union(list_array.nulls(), scalar_array.nulls());

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Replaced the NullBufferBuilder with let row_nulls = NullBuffer::union(list_array.nulls(), scalar_array.nulls()) computed once before the loop, then passed directly into GenericListArray::try_new. The is_null check inside the loop is kept since it's still load-bearing for the offset buffer construction (we need to push zero-length offsets for null rows).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the new row_nulls for the null check inside the loop now instead of still querying list + scalar separately

e.g.

row_nulls.is_some_and(|nb| nb.is_null(i))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Loop check now uses row_nulls.as_ref().is_some_and(|nb| nb.is_null(row)) instead of querying list_array and scalar_array separately. Same semantics (NullBuffer::union ORs the two null buffers), one source of truth.

Comment thread datafusion/functions-nested/src/array_scale.rs Outdated
@crm26 crm26 force-pushed the feat/array-scale branch 2 times, most recently from a29be46 to bf052a9 Compare May 23, 2026 16:05
Adds `array_scale(array, scalar)` returning a new array with each element
multiplied by a scalar. Aliased as `list_scale`. Part of the per-function
split sequence on tracking issue apache#21536, following the pattern of the
already-merged PRs in this series.

Semantics:
- NULL row in array -> NULL row out
- NULL element at position i in array -> NULL element at i out
  (per-element propagation)
- NULL scalar -> NULL row out (whole-row, because the scalar applies
  uniformly to every element; the entire operation is undefined)
- Empty array -> empty array

First argument is List/LargeList/FixedSizeList of any numeric type.
Second argument is a numeric scalar. Both coerce to Float64. List-like
inputs follow the same widening rules as the binary-op siblings:
LargeList wins, FixedSizeList coerces to List.
@crm26 crm26 force-pushed the feat/array-scale branch from bf052a9 to 172c35d Compare May 24, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants