Skip to content

HDDS-14760. Fix CompleteMultipartUpload rejecting chunked request body as empty#10461

Open
rich7420 wants to merge 3 commits into
apache:masterfrom
rich7420:HDDS-14760
Open

HDDS-14760. Fix CompleteMultipartUpload rejecting chunked request body as empty#10461
rich7420 wants to merge 3 commits into
apache:masterfrom
rich7420:HDDS-14760

Conversation

@rich7420

@rich7420 rich7420 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Completing a multipart upload with chunked transfer encoding (no
Content-Length, as the AWS C++ SDK does with Expect: 100-continue) fails
with HTTP 400 You must specify at least one part, even though the body does
contain the parts.

CompleteMultipartUploadRequestUnmarshaller used InputStream#available() == 0
to detect an empty body. available() reports only the bytes readable
without blocking, so for a not-yet-buffered chunked body it can return 0
even when the body is non-empty, causing the spurious error.

The fix replaces available() with a PushbackInputStream that reads one
byte: -1 means the body is truly empty (original error preserved), otherwise
the byte is pushed back and parsing continues. This is the only use of
available() in S3 Gateway, so no other request type is affected.

What is the link to the Apache JIRA

https://issues.apache.org/jira/projects/HDDS/issues/HDDS-14760

How was this patch tested?

  • New unit tests in TestCompleteMultipartUploadRequestUnmarshaller:
    fromStreamWhereAvailableReturnsZero (reproduces the bug; fails before the
    fix) and emptyBodyThrowsMustSpecifyAtLeastOnePart (empty body still
    rejected). Existing unmarshaller and multipart tests pass.
  • New acceptance test in MultipartUpload.robot completing an MPU via a
    presigned URL with Transfer-Encoding: chunked.
  • CI run: https://github.com/rich7420/ozone/actions/runs/27128059905

…y as empty

The unmarshaller used InputStream#available()==0 to detect an empty body,
but available() can return 0 for a non-empty body that is not yet buffered
(e.g. chunked transfer with Expect: 100-continue, as the AWS C++ SDK does),
causing a spurious 'You must specify at least one part' error. Detect the
empty body by reading one byte via PushbackInputStream instead.
@kerneltime

Copy link
Copy Markdown
Contributor

Thank you for the fix. Review incoming.

@kerneltime kerneltime left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix — this is a real interop problem (the AWS C++ SDK with Expect: 100-continue genuinely trips it), and replacing available() with a one-byte probe is the right remedy. available() only promises bytes readable without blocking, so an unbuffered chunked body legitimately reports 0. One detail worth calling out as correct: super.readFrom is handed the PushbackInputStream rather than the original stream, so the consumed byte is replayed into the SAX parser — easy to get wrong, good that it's threaded through.

I traced the change end-to-end and have no concerns about the production code: the empty-body contract is preserved (-1 → same 400 "You must specify at least one part"), the new blocking read is bounded by the HTTP idle timeout and only reachable after SigV4 auth, and the CompleteMultipartUpload body isn't signature-bound, so allowing chunked/unsigned bodies opens no auth bypass.

The suggestions below are all test-hardening, not blockers:

Unit tests

  • The fix intentionally lets non-empty bodies through, which moves empty/malformed-parts rejection downstream — but that newly-reachable path has no unit coverage. The highest-value addition is a case for a well-formed body with zero <Part> elements (<CompleteMultipartUpload></CompleteMultipartUpload>), which is exactly the shape a chunked SDK could send: assert the unmarshaller returns an empty part list, documenting that empty-list rejection is deferred to the endpoint. A malformed-XML case asserting WebApplicationException would also lock down the new non-empty-but-unparseable path.
  • checkContent asserts the part count and both ETags but not PartNumber; adding e.g. assertEquals(1, partList.get(0).getPartNumber()) would catch a number drop/swap.

Acceptance test (MultipartUpload.robot)

  • Should Contain ${result} ${key} doesn't distinguish success from failure — the key also appears in the <Resource> element of the S3 error response. The real discriminators here are Should Contain ETag and Should Not Contain must specify at least one part; I'd drop the ${key} check or pin it to the success <Key> element.
  • [Teardown] removes the local XML but doesn't abort the initiated MPU, so a failed assertion leaks an in-progress upload. Consider adding Run Keyword And Ignore Error Abort MPU.
  • Forcing chunked via -H "Transfer-Encoding: chunked" -H "Content-Length:" --data-binary is curl-build-dependent; if curl re-adds Content-Length the test still passes but stops exercising the chunked path. Might be worth confirming (e.g. via curl -v / server log) that the request actually went out chunked, so the regression test can't silently degrade.

Thanks for the fix and the regression coverage.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes S3 Gateway multipart upload completion failures when the client sends the CompleteMultipartUpload XML using chunked transfer encoding (no Content-Length), by avoiding the incorrect use of InputStream#available() to detect an empty body.

Changes:

  • Replace InputStream#available() == 0 empty-body detection with a PushbackInputStream one-byte read/unread probe.
  • Add unit tests covering the “available() returns 0 but body is non-empty” regression and the truly-empty-body error case.
  • Add a smoketest/acceptance test that completes an MPU via a presigned URL while forcing chunked transfer encoding.

Reviewed changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated 1 comment.

File Description
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/CompleteMultipartUploadRequestUnmarshaller.java Fix empty-body detection for chunked request bodies by probing with PushbackInputStream instead of available().
hadoop-ozone/s3gateway/src/test/java/org/apache/hadoop/ozone/s3/endpoint/TestCompleteMultipartUploadRequestUnmarshaller.java Add unit coverage for the chunked/available==0 regression and empty-body behavior.
hadoop-ozone/dist/src/main/smoketest/s3/presigned_url_helper.py Add helper to generate a presigned URL for CompleteMultipartUpload for smoketests.
hadoop-ozone/dist/src/main/smoketest/s3/MultipartUpload.robot Add an acceptance test completing MPU with chunked transfer encoding via presigned URL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +100 to +101
assertTrue(ex.getMessage().contains("must specify at least one part"),
"Unexpected message: " + ex.getMessage());
rich7420 added 2 commits June 9, 2026 11:52
- Add unit coverage for a well-formed body with zero <Part> elements.
- Use the success-only CompleteMultipartUploadResult/ETag markers (not the
  key, which also appears in an error <Resource>) to assert completion.
- Abort the initiated MPU in teardown so a failed assertion does not leak it.
… text

The empty-body test asserted on the literal error message, duplicating the
production string. Assert on the stable S3ErrorTable.INVALID_REQUEST code
(read from the OS3Exception cause) instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants