[SEA-NodeJS] (6/8) CloudFetch + Inline Arrow result fetching#381
[SEA-NodeJS] (6/8) CloudFetch + Inline Arrow result fetching#381msrathore-db wants to merge 2 commits into
Conversation
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
fa6da7f to
5d64f20
Compare
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
d529954 to
da92c45
Compare
5d64f20 to
01e4b91
Compare
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
da92c45 to
6be8799
Compare
01e4b91 to
cd3b1e1
Compare
… IOperationBackend (#378) * sea-abstraction: introduce IBackend / ISessionBackend / IOperationBackend Refactors DBSQLClient/Session/Operation to dispatch through three backend interfaces. ThriftBackend (lib/thrift-backend/) contains the relocated existing thrift logic. SeaBackend (lib/sea/) is a stub for M0; the sea-napi-binding feature wires the real impl. Public surface (lib/index.ts) unchanged. No new dependencies. All existing tests pass. Files: - lib/contracts/IBackend.ts (new) - lib/contracts/ISessionBackend.ts (new) - lib/contracts/IOperationBackend.ts (new) - lib/contracts/IDBSQLClient.ts (adds useSEA?: boolean to ConnectionOptions) - lib/thrift-backend/ThriftBackend.ts (new) - lib/thrift-backend/ThriftSessionBackend.ts (new) - lib/thrift-backend/ThriftOperationBackend.ts (new) - lib/sea/SeaBackend.ts (new, M0 stub) - lib/DBSQLClient.ts (dispatch through IBackend; useSEA picks SeaBackend) - lib/DBSQLSession.ts (facade over ISessionBackend; staging stays here) - lib/DBSQLOperation.ts (facade over IOperationBackend; iterators/fetchAll stay here) - tests/unit/DBSQLClient.test.ts (retarget internal state lookup through backend; pre-seed client.backend in tests that bypass connect()) - tests/unit/DBSQLOperation.test.ts (retarget internal state lookup through backend) * sea-abstraction: cleanup — restore JSDoc, dedupe test pre-seed, fix inline type Addresses code-bloat-watchdog findings from commit 0085928: - Restores public-API JSDoc on DBSQLSession + DBSQLOperation methods (was deleted as scope creep; contracts unchanged so docs still apply) - Adds makeStubbedClient() helper to tests/unit/DBSQLClient.test.ts; replaces 14× duplicated ThriftBackend pre-seed - Imports WaitUntilReadyOptions instead of inline option types in IOperationBackend + DBSQLOperation.waitUntilReady * sea-abstraction: address full-review findings (F1-F17 except F5) Round-N fixes from the 9-reviewer pre-review. Public IOperation/DBSQLOperation surface preserved byte-identical; backend interfaces (IBackend / ISessionBackend / IOperationBackend) made fully neutral so both Thrift and SEA can implement the same contract. F1 — neutral DTOs at IOperationBackend with Thrift-shape preservation on the public facade (adapter pattern): - lib/contracts/OperationStatus.ts (new) — neutral OperationStatus + OperationState enum mirroring databricks-sql-python's CommandState and kernel pyo3's StatementStatus taxonomy. - lib/contracts/ResultMetadata.ts (new) — neutral ResultMetadata + ResultFormat enum mirroring the three TSparkRowSetType cases. - IOperationBackend.status()/getResultMetadata() return the neutral DTOs. - ThriftOperationBackend.status() adapts at the boundary via adaptOperationStatus / adaptResultMetadata; module-level helpers thriftStateToOperationState and thriftRowSetTypeToResultFormat do the enum maps. - ThriftOperationBackend exposes thriftStatusResponse() and thriftResultMetadataResponse() as public Thrift-only accessors used by the facade's zero-loss fast path (kept for internal state-machine + result-handler dispatch as well). - lib/utils/thriftWireSynthesis.ts (new) — synthesizeThriftStatus and synthesizeThriftResultSetMetadata: convert neutral DTOs back to Thrift wire shape for the non-Thrift backend path. Lossy on Thrift-only fields (taskStatus, numModifiedRows, cacheLookupResult, etc.). - DBSQLOperation.status() and getMetadata() preserved Thrift return shape: Thrift backend path returns the real wire response (zero loss); non-Thrift backend path synthesizes via the new helpers. - DBSQLOperation.getResultMetadata() — new additive neutral accessor on IOperation; DBSQLSession.handleStagingOperation uses it instead of the deprecated Thrift-shaped getMetadata(). F2 — IBackend.connect() is now zero-arg. Backend reads everything it needs from IClientContext / constructor; matches Python connector's pattern of passing session_configuration via constructor not method-arg. F3 — Restore the 'Server protocol version' debug log dropped by the original PR-378 refactor. Re-added to ThriftSessionBackend.constructor with the LogLevel.debug + IClientContext.getLogger() pattern; matches the pre-refactor log site at main:lib/DBSQLSession.ts:175. F4 + F11 + F14 — SeaBackend stub safety: - close() is a no-op so DBSQLClient.close()'s state-clearing block can finish even after a useSEA: true connect() failure. - connect() and openSession() throw HiveDriverError instead of generic Error, matching the rest of the codebase. - connect(options: ConnectionOptions) and openSession(request: OpenSessionRequest) declare their parameters (with @typescript-eslint/no-unused-vars disable) so IDE autocomplete prompts the M1 SEA implementer. F6 + F7 + F9 + F10 — JSDoc on backend interfaces: - IBackend: connect/openSession/close docstrings; close() doc explicitly states transport-layer cleanup is owned by DBSQLClient. - ISessionBackend: copy IDBSQLSession's per-method one-liner JSDoc. - IOperationBackend: doc hasResultSet (readonly external; mutates internally), waitUntilReady (MUST throw OperationStateError on terminal non-success). F8 — tests/unit/sea/SeaBackend.test.ts (new) locks in the stub contract: connect() rejects HiveDriverError, openSession() rejects HiveDriverError, close() resolves no-op. ~30 LOC. F12 — Drop legacy { handle, ... } ctor branch from DBSQLOperation and DBSQLSession: - Facades accept only { backend, context }. - DBSQLSession no longer imports ThriftSessionBackend at all. - DBSQLOperation imports ThriftOperationBackend solely for the F1 typed downcast (zero-loss Thrift fast path); this is a deliberate, scoped coupling tied to the back-compat decision. - tests/unit/.stubs/createSessionForTest.ts and createOperationForTest.ts (new) wrap the legacy shape; all 48 + 54 test sites mechanically migrated. F15 — ThriftOperationBackend.waitUntilReady uses imported WaitUntilReadyOptions type instead of an anonymous inline shape. F16 — useSEA flag moved out of public ConnectionOptions: - Removed useSEA?: boolean from the exported lib/contracts/IDBSQLClient.ts ConnectionOptions; no longer ships in the public .d.ts. - lib/contracts/InternalConnectionOptions.ts (new) declares the flag as a non-exported internal extension; DBSQLClient.connect() reads via a typed cast. Mirrors Python's kwargs.get('use_sea', False) pattern at databricks-sql-python/src/databricks/sql/session.py:111. F17 — Missing return; after case 'timeout' in forwardConnectionEvent so a future fifth case doesn't silently fall through. The trailing return; in the last case triggers no-useless-return — quieted with a localized eslint-disable-next-line + intent comment. F5 — deferred per owner instruction (test-only as any cast tightening). Verification: - yarn lint clean (3 pre-existing warnings in tests/e2e/protocol_versions.test.ts). - yarn build clean. - tsc --noEmit -p tsconfig.json clean (apart from pre-existing examples/tokenFederation/* import errors that exist on main). - Runtime smoke test of SeaBackend stub + Thrift-wire synthesis round-trip passes 5/5 assertions. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * sea-abstraction: address PR #378 review-comment fixes (H1 / M1-M4 / L1-L10) Addresses 15 review findings from the code-review-squad pass on PR #378. L11 (backend kind field on the three interfaces) is deliberately deferred to avoid a cross-stack cascade ripple while the downstream PRs are still in flight. H1 — fetchChunk lost mid-flight failIfClosed regression. Add optional `isClosed?: () => boolean` to IOperationBackend.fetchChunk's options bag. ThriftOperationBackend.fetchChunk probes it after the setTimeout(0) macrotask yield and returns [] when set; the facade's post-fetch failIfClosed then raises the user-visible OperationStateError. Restores the guard that the refactor split across the facade/backend boundary so a cancel/close arriving during the yield window no longer runs the data RPC to completion needlessly. M1 — neutralize WaitUntilReadyOptions callback shape. Introduce IOperationBackendWaitOptions { callback?: (status: OperationStatus) => unknown } on the backend interface. Facade keeps the public Thrift-typed OperationStatusCallback and adapts at the boundary by wrapping the user's callback with synthesizeThriftStatus. ThriftOperationBackend.waitUntilReady consumes the neutral options and passes adaptOperationStatus(response) to the callback. M2 — synthesizeOkStatus maps OperationState to TStatusCode. Add synthesizeStatusFromOperation that returns ERROR_STATUS for Failed/Cancelled/Closed (carrying errorMessage + sqlState) and SUCCESS_STATUS otherwise. Wire it into synthesizeThriftStatus so legacy Status.assert(resp.status) sees the right code on non-Thrift backends. M3 — TelemetryEvent + DriverConfiguration carry a backend tag. Add optional backend?: 'thrift' | 'sea' | 'kernel' on both interfaces so dashboards can slice latency/error rate by backend without a metrics-schema migration once non-Thrift emission goes live. M4 — test coverage for the synthesize helpers + useSEA failure path. New tests/unit/thrift-backend/wireSynthesis.test.ts covering all OperationState/ResultFormat mappings, ERROR_STATUS carries errorMessage/sqlState, hasResultSet round-trip, schema/arrowSchema/ lz4Compressed/isStagingOperation preservation, and the L3 throw on unknown ResultFormat. New test in DBSQLClient.test.ts asserts that a useSEA:true connect failure leaves this.backend === undefined and the next openSession() surfaces "not connected" rather than the SeaBackend's "not implemented" error. L1 — forwardConnectionEvent normalizes payload to Error. Replace `payload as Error` with `payload instanceof Error ? payload : new Error(String(payload))` so a backend that emits a non-Error through the cross-backend onConnectionEvent doesn't crash the logger.log call. L2 — DBSQLClient.connect publishes this.backend only on success. Construct the backend locally, await connect() in a try/catch, run a best-effort backend.close() (per IBackend.close()'s safe-on-partial-init contract) and rethrow on failure. Only assign this.backend after a clean connect so a failed connect surfaces "DBSQLClient: not connected" on the next openSession. L3 — resultFormatToThrift throws on unknown ResultFormat. Replace the silent default fallback to COLUMN_BASED_SET with a HiveDriverError. Prevents a future ResultFormat enum extension from silently routing results through JsonResultHandler and surfacing garbled rows. L4 — DBSQLOperation.getMetadata carries @deprecated. Adds the canonical TypeScript JSDoc tag so IDEs (strikethrough), tsc, ESLint plugins, and agentic codegen pick up the soft deprecation in favour of getResultMetadata. L5 — numberToInt64 re-export carries @deprecated. Re-export through a named const with a JSDoc block (rather than a bare `export { ... } from`) so the @deprecated tag attaches to the symbol consumers see in their IDE / .d.ts. L6 — DBSQLSession.runBackend helper. Collapse 11 duplicated `failIfClosed → backend.X → failIfClosed` brackets into a single private runBackend<T>(fn) so the open-flag-before-and-after contract has a name and can't be forgotten in a new delegation method. L7 — restore three why-comments deleted from DBSQLSession. Staging-detection invariant in executeStatement, AWS-vs-Azure 404 difference on staging-remove, and the Content-Length-required note on staging-upload. Verbatim from main; these document non-obvious intentional behaviour the refactor inadvertently dropped. L8 — hasResultSet becomes a method on IOperationBackend. The value is state-dependent (the Thrift impl mutates the underlying operation handle inside processOperationStatusResponse), so the property+readonly+disclaimer-JSDoc pattern was misleading. Method form makes the live-read semantics obvious to a fresh implementer. 3 facade call sites updated. L9 — wireSynthesis moves under thrift-backend. The file imports Thrift IDL types and produces Thrift-typed values; it belongs next to ThriftOperationBackend, not in the neutral lib/utils/ tree where it would creep into the dependency cone of future backend-neutral helpers. Same reasoning that placed numberToInt64 and getDirectResultsOptions under thrift-backend/. L10 — interface-level downcast policy. Add a JSDoc paragraph on IOperationBackend grandfathering the two existing `instanceof ThriftOperationBackend` downcasts in DBSQLOperation.status/getMetadata and prohibiting new ones. Future zero-loss back-compat needs should extend the interface (or add an optional method) rather than spawn a per-backend branch matrix. Gates: yarn build (exit 0), yarn lint (0 errors, 3 pre-existing warnings in tests/e2e/protocol_versions.test.ts), yarn test on touched files (163 passing, +12 net new tests from M4 work; 2 failures pre- existing on PR head unchanged: getSchema-directResults and the LZ4-cloud-fetch flag — both flagged in the team-lead playbook as known prior regressions). Cascade implications for downstream PRs (#380 #377 #379 #382 #381 #384 #383): L8 converts hasResultSet from a property to a method, M1 swaps WaitUntilReadyOptions for IOperationBackendWaitOptions on the backend interface. Both are mechanical renames at downstream backend impls when they rebase. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Fix SEA abstraction merge fallout Restore Thrift compatibility paths needed by existing schema and result-handler tests after merging main telemetry changes. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Restore Thrift result-handler compatibility hooks Keep existing e2e-only inspection hooks available through the facade while the new backend abstraction owns result handling. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
6be8799 to
611f68c
Compare
cd3b1e1 to
76f97e4
Compare
611f68c to
df98b15
Compare
Implements IOperationBackend over the napi binding's Statement. fetchChunk decodes Arrow IPC bytes → apache-arrow RecordBatch → ArrowResultConverter (Phase 1+2 reused unchanged) → JS rows. All M0 datatypes round-trip via the same converter the thrift path uses (BOOL, INT8/16/32/64, FLOAT, DOUBLE, DECIMAL, STRING, BINARY, DATE, TIMESTAMP, INTERVAL, ARRAY, MAP, STRUCT). Unit tests construct synthetic IPC batches; e2e test against pecotesting confirms byte-identical parity vs thrift. No new dependencies. ArrowResultConverter / ResultSlicer / OperationIterator all reused unchanged (DRY). Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Rebased onto the updated sea-execution. Updates StatementStub to the merged-kernel Statement surface (statementId + status accessors, sync schema()) and casts useSEA in the relocated tests/e2e/sea/results-e2e test. Pre-existing SeaOperationBackend neutral-type conformance remains a sea-results follow-up. Co-authored-by: Isaac
76f97e4 to
42de02f
Compare
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
|
🔴 P0 (Critical) — verified directly, must fix status() and getResultMetadata() return Thrift-shaped objects instead of the neutral OperationStatus/ResultMetadata DTOs the contract requires — silently corrupts the public status()/getMetadata() output. I traced and executed the data flow to confirm this is real (not theoretical):
▎ Fix: return neutral DTOs — 🟠 P1 (Important — should fix)
🟡 P2 (Minor)
|
Stack
Linear stack of 8 PRs landing the M0 + M1 Phase 1 SEA NodeJS work. Merge in order from base ↑ to tip. The tip branch (
msrathore/sea-auth-u2m, PR #383) is the single snapshot containing everything in flight — point your test or benchmark harness at it for an end-to-end check.sea-abstractionsea-napi-binding.nodeartifactsea-errors-loggingErrorCode→ JS error-class mapping (M0 minimum)sea-authuseSEA: truesea-executionexecuteStatement+openSession(sessionConfig, initialCatalog/Schema)sea-resultssea-operationsea-auth-u2m← TIPCompanion kernel stack (
databricks/databricks-sql-kernel): 8 PRs — root #26 (async-public-api) → #27 → #25 → #29 → #28 → #30 → #24 → #23 (tip).Policy: new PRs always stack on the current tip. No sibling/parallel topology. No force-pushes on existing PRs unless absolutely necessary; if a PR's content is wrong, add a fix-up commit on top of the stack tip rather than rewriting history.
This PR is position 6/8.
Summary
SeaOperationBackend+SeaResultsProvider+SeaArrowIpcwire the kernel's CloudFetch and Inline Arrow paths to JS row-shape conversion.Size note (964 LOC, marginally over cap)
CloudFetch and Inline Arrow share too much (ArrowResultConverter integration) to split cleanly. Reviewable in one pass.
Downstream fixes / reviewer note
intervalsAsStringknob and is wired in feat(sea): Thrift-parity — params, intervals, getInfo, SQL-error class, input validation + queryTags #403; reviewers do not need to request that final interval behavior in this lower result-fetching PR.IOperationBackendcontract mismatch called out on this PR is fixed on this branch and carried forward through [SEA-NodeJS] (7/8) Operation lifecycle — cancel / close / finished + INTERVAL parity + napi relocation #384/[SEA-NodeJS] (8/8) M1 Phase 1 — OAuth M2M + OAuth U2M (5 review rounds, ZERO HIGH at close) #383/feat(sea): expose maxConnections + lock complex types as Arrow default #392/feat(sea): queryTags through executeStatement + fetchAll regression lock #393/feat(sea): expose TLS verification toggle + custom CA on ConnectionOptions #407/feat(sea): wire rowLimit + statementConf + TIMESTAMP_NTZ/LTZ params #408; SEA status/metadata now return the neutral DTO shape while the result converter still uses the Thrift metadata internally.useCloudFetchplumbing has been removed from the SEA execution path and unsupporteduseCloudFetchis rejected rather than silently ignored in the downstream stack.Test plan
useCloudFetchoverride forwarded viasession_confDraft until you give the go for review.
Tracking
Co-authored-by: Isaac