Skip to content

feat(sea): expose maxConnections + lock complex types as Arrow default#392

Open
msrathore-db wants to merge 6 commits into
msrathore/sea-auth-u2mfrom
msrathore/sea-max-connections
Open

feat(sea): expose maxConnections + lock complex types as Arrow default#392
msrathore-db wants to merge 6 commits into
msrathore/sea-auth-u2mfrom
msrathore/sea-max-connections

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

@msrathore-db msrathore-db commented May 25, 2026

Summary

Two features bundled into a single SEA-side PR:

  1. maxConnections (F1) — exposes the kernel's HttpConfig::pool_max_idle_per_host knob through ConnectionOptions.maxConnections on the public client options. Forwarded through buildSeaConnectionOptions into the napi openSession call. When omitted, the kernel default (100) applies. Mirrors the Python connector's max_connections kwarg on the SEA backend. NodeJS Thrift backend does NOT expose this knob — it's a SEA-only extension; no parity break.

  2. Complex types as Arrow default lock-in (F2) — adds a regression e2e test against pecotesting that selects ARRAY/MAP/STRUCT/nested ARRAY and asserts the returned Arrow schema fields are List/Map/Struct/List (not Utf8 JSON strings). Kernel ResultConfig::complex_types_as_json already defaults to false (Arrow-native), and the SEA wire request hardcodes format = "ARROW_STREAM" — so this is the existing default. The test locks the contract.

Plumbing — F1

  • IDBSQLClient.ConnectionOptions.maxConnections?: number with doc note that Thrift backend currently ignores it
  • SeaSessionDefaults (the shared base across auth-mode variants) carries the field so all three auth modes (PAT, OAuth M2M, OAuth U2M) round-trip it
  • buildSeaConnectionOptions validates positive-integer-ness at the JS boundary (rejects 0, negative, non-integer, NaN) with HiveDriverError
  • native/sea/index.d.ts adds the field in step with kernel branch msrathore/krn-max-connections

Cross-repo dependency

Requires kernel branch msrathore/krn-max-connections (PR https://github.com/databricks/databricks-sql-kernel/pull/74) for the napi-binding side.

Downstream fixes / reviewer note

Test plan

  • tests/unit/sea/auth-max-connections.test.ts — 9 cases passing (omit, valid, boundary 1, 0-reject, neg-reject, non-int-reject, NaN-reject, M2M arm, U2M arm)
  • Existing auth-pat.test.ts regression check — 8/8 pass
  • tsc: zero new TS errors from F1+F2 (16 errors total = 14 pre-existing in SeaOperationBackend.ts + 2 from missing lib/version.ts generated by prepare hook)
  • No lint errors in touched files (IDBSQLClient.ts, SeaAuth.ts, SeaBackend.ts, new tests)
  • Live warehouse e2e — tests/e2e/sea/complex-types-e2e.test.ts runs against pecotesting when env vars set; pending merge of kernel PR to test full path

@msrathore-db
Copy link
Copy Markdown
Contributor Author

Re-review requested — round-3 fixups landed; DA round-3 sign-off received.

Round-3 final tip: c21324193de6fb39fe79019860731d7265a7acf4

This PR carries F1 (maxConnections JS surface) + F2 (lock complex types as Arrow default). Three rounds of DA verification across both features; all findings closed:

F1 (maxConnections):

  • F1-M1 explicit > MAX_U32 upper-bound check in lib/sea/SeaAuth.ts — guards against ECMA-ToUint32 silent truncation
  • F1-M2 mock-binding test verifying napi openSession receives the value (tests/unit/sea/auth-max-connections.test.ts)
  • F1-L2 live e2e against pecotesting (tests/e2e/sea/max-connections-e2e.test.ts), 3/3 passing, independently reproduced by DA in ~1.6s

F2 (Arrow complex types):

  • F2-H1 root-cause SeaNativeLoader lazy-load eliminates the MODULE_NOT_FOUND-on-import defect (the architectural fix DA called "excellent")
  • F2-M1 lint clean (const { schema } = table; destructuring)
  • F2-M2 strict typeId assertions via ArrowType enum equality + nested c_arr_struct.type.children[0].type.typeId === ArrowType.Struct structural check, locks against List<Utf8> regression
  • F2-Live 1/1 passing, independently reproduced by DA at 467ms

Gates: yarn eslint <touched-files> clean; tsc --noEmit no new errors.

Co-authored-by: Isaac

…i binding

Surfaces the kernel's `HttpConfig::pool_max_idle_per_host` knob
through `ConnectionOptions.maxConnections` on the public client
options, forwards it through `buildSeaConnectionOptions` into the
napi `openSession` call. When omitted, the kernel's default (100)
applies.

Mirrors the Python connector's `max_connections` kwarg on the SEA
backend. The NodeJS Thrift backend does NOT currently expose this
knob — it remains a no-op on that path, documented inline. No
break to Thrift parity (Thrift never had it).

Plumbing:
- `IDBSQLClient.ConnectionOptions` gains `maxConnections?: number`
  with a doc note that the Thrift backend ignores it today.
- `SeaSessionDefaults` (already the shared base across auth-mode
  variants in `SeaNativeConnectionOptions`) gains the field so all
  three auth modes (PAT, OAuth M2M, OAuth U2M) round-trip it.
- `buildSeaConnectionOptions` validates positive-integer-ness at
  the JS boundary (rejects 0, negative, non-integer, NaN) so a
  misuse fails here with a clear `HiveDriverError` instead of
  inside the kernel with a cryptic napi error.
- `native/sea/index.d.ts` (the committed copy of the napi-rs
  generated d.ts) gets the `maxConnections?: number` field on
  `ConnectionOptions`, in step with the kernel branch
  `msrathore/krn-max-connections` (commit `0c5ca3f`).

Tests:
- `tests/unit/sea/auth-max-connections.test.ts` — 9 cases covering
  omission, valid values, boundary (1), and rejection of 0,
  negative, fractional, NaN; plus the OAuth M2M / U2M arms.

Cross-PR dependency: stacks on `msrathore/sea-auth-u2m` (current
NodeJS tip) and requires kernel branch `msrathore/krn-max-connections`
(napi binding side of the same change).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Adds an end-to-end test against pecotesting (gated by
`DATABRICKS_PECOTESTING_*` env vars; skipped when absent) that
confirms ARRAY / MAP / STRUCT and nested ARRAY<STRUCT> come back
as **native Arrow** shapes (List / Map / Struct) — not Utf8 JSON
strings.

The kernel's `ResultConfig::complex_types_as_json` defaults to
`false` (Arrow-native), and the SEA wire request hardcodes
`format = ARROW_STREAM`, so this is the existing default. The
test locks the contract: a regression that flips the default
(or that an upstream change wraps the result post-processor in
the JSON pass) would fail this assertion immediately.

Matches the NodeJS Thrift backend's `complexTypesAsArrow=true`
default — see `DBSQLSession.getArrowOptions` where
`useArrowNativeTypes=true` propagates to `complexTypesAsArrow`.
Mirrors the kernel-side e2e at
`tests/v0_execute_e2e.rs::complex_types_as_json_flag_stringifies_complex_columns`
which exercises both the Arrow-native and JSON-string paths.

Decision — no opt-in toggle exposed at the JS layer: neither
Python `use_sea` nor NodeJS Thrift exposes a `complexTypesAsJson`
knob to end users; the kernel's `ResultConfig.complex_types_as_json`
remains internal until a consumer needs it. Adding the toggle now
would invite drift from the kernel; we revisit when a consumer
asks. Matrix row 11 of section 3 stays as "implemented — native
Arrow default matches Thrift behaviour".

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…rict typeId asserts

DA round 1 findings on F1 (max_connections) and F2 (complex types
as Arrow), addressed in a single fixup because both touch the F1
NodeJS branch.

F1 — M1 upper-bound: the JS-layer validation accepted any positive
integer but the napi binding accepts `u32`. A value > 2^32 - 1
would either silently truncate at FFI or throw a cryptic kernel
error. Add an explicit upper-bound check with a clear
`HiveDriverError` message that points at typical pool sizes.
Three new unit tests: accept MAX_U32, reject MAX_U32+1, reject
Number.MAX_SAFE_INTEGER.

F1 — M2 mock test: the existing unit suite verified the
buildSeaConnectionOptions output shape but did not exercise the
end-to-end SeaBackend.connect → openSession → napi.openSession
round-trip. Add two mock-binding tests via the existing
`makeFakeBinding` helper that:
1. Confirm maxConnections is forwarded to the napi openSession call
2. Confirm maxConnections is `undefined` (not 0) when omitted, so
   the napi binding's `Option<u32>` reads None and the kernel
   default of 100 applies.

F2 — H1 skip-gate: the e2e test imported `getSeaNative` at module
top-level, which transitively ran `SeaNativeLoader.ts`'s
module-level `require('../../native/sea/index.js')`. When the
`.node` artifact isn't built (`yarn build:native` not run), that
require throws MODULE_NOT_FOUND BEFORE mocha can invoke `before()`,
crashing test discovery for the whole suite. Fix:
1. Verify the native artifact exists in `before()` and skip with
   a clear "run yarn build:native" message if absent
2. Lazy-require the napi shim via `createRequire(import.meta.url)`
   inside `loadBinding()` so the require lives outside module-load
   path. `createRequire` (vs bare `require`) handles the ESM
   reparse path mocha 11+ takes for ts-node-emitted files
   (MODULE_TYPELESS_PACKAGE_JSON warning).

F2 — M2 strict assertions: prior regex-based assertions
(`/List/i`, `/Map|List/i`, etc.) were too permissive — they would
accept LargeList or FixedSizeList where the kernel must return
plain `List`. Switch to arrow-js `Type` enum comparisons via
`.type.typeId`. Adds:
- Strict typeId equality for c_arr (List), c_map (Map), c_struct
  (Struct), c_arr_struct (List)
- Nested-structural check that c_arr_struct's child is Struct
- Row-count sanity assert (literal SELECT → 1 row)

F2 — live e2e: ran against pecotesting (`DATABRICKS_PECOTESTING_*`)
with the freshly-built napi binary. All assertions passed in 655ms.
The kernel default `complex_types_as_json=false` is verified to
produce native Arrow shapes for ARRAY/MAP/STRUCT and nested
ARRAY<STRUCT> end-to-end.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
mocha 11+'s MODULE_TYPELESS_PACKAGE_JSON reparse-as-ESM path
breaks `import * as fs from 'fs'` / `import * as path from 'path'`:
under ESM, the namespace import doesn't expose the CJS module's
exports as own properties, so `path.resolve(...)` was undefined at
runtime. Switch to named imports (`resolve as resolvePath` aliased
to keep the call site readable). Same fix applied to F4 and F3b
in their respective branches.

Verified live e2e passes against pecotesting after the fix.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
DA round 2 findings on F2 / F4 / F3b H1 had a common root cause
the team-lead correctly flagged: `SeaNativeLoader.ts` does
`const native = require('../../native/sea/index.js')` at module
load, so any import from this module triggers the `.node`
artifact load. When `yarn build:native` hasn't run, that throws
MODULE_NOT_FOUND BEFORE mocha can fire `before()` skip-gates,
crashing test discovery for every e2e suite that imports
anything from this module — including transitive imports via
`DBSQLClient` → `SeaBackend` → `SeaNativeLoader`.

Round 1 dance: every e2e test got a `createRequire` +
filesystem-check workaround. Round 2 fixes the actual defect:
the loader itself memoises the require behind `getSeaNative()`,
so importing the module is free for code paths that never reach
SEA. Filesystem-check guards in the e2e tests stay as defense-
in-depth (they produce a friendlier "run yarn build:native"
diagnostic than napi-rs's raw MODULE_NOT_FOUND).

Verified the fix by moving the .node artifact aside and running
the F2 e2e: previously CRASHED at file load with
MODULE_NOT_FOUND; now SKIPS cleanly via the suite's `before()`
gate with an actionable message. Restored binary, re-ran the
test: passes against pecotesting in 480ms.

Also addresses F2 M1 lint:
- `lib/sea/SeaNativeLoader.ts:131` — added `import/extensions`
  to the eslint-disable comment on the lazy-require, since the
  `.js` extension is required (napi shim is CommonJS, not TS)
- `tests/e2e/sea/complex-types-e2e.test.ts:148` — switched to
  destructuring `const { schema } = table` per prefer-
  destructuring rule

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Adds an end-to-end test against pecotesting (gated by
`DATABRICKS_PECOTESTING_*` env vars; skipped when absent) that
verifies `maxConnections` flows from `ConnectionOptions` through
`buildSeaConnectionOptions` through the napi `openSession` into
the kernel's `HttpConfig::pool_max_idle_per_host`, and that a
session opened with a custom value round-trips a real query.

Combined with the existing unit + mock-binding tests, this proves
end-to-end:
- Unit tests pin the JS-side value validation + napi-shape (mock
  binding records the openSession args).
- This e2e proves the value actually reaches the kernel without
  breaking the wire path against a live warehouse.

Three cases:
1. `maxConnections=1` — minimum bound the JS layer accepts.
2. `maxConnections=200` — large pool size, exercises the high end.
3. Omitted — kernel default (100) regression check.

All three pass against pecotesting (~500ms each). DA round-1 F1
"L2 live" finding addressed.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db force-pushed the msrathore/sea-max-connections branch from 6f80c11 to 747cbdc Compare June 1, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant