HDDS-14760. Fix CompleteMultipartUpload rejecting chunked request body as empty#10461
HDDS-14760. Fix CompleteMultipartUpload rejecting chunked request body as empty#10461rich7420 wants to merge 3 commits into
Conversation
…y as empty The unmarshaller used InputStream#available()==0 to detect an empty body, but available() can return 0 for a non-empty body that is not yet buffered (e.g. chunked transfer with Expect: 100-continue, as the AWS C++ SDK does), causing a spurious 'You must specify at least one part' error. Detect the empty body by reading one byte via PushbackInputStream instead.
|
Thank you for the fix. Review incoming. |
kerneltime
left a comment
There was a problem hiding this comment.
Nice fix — this is a real interop problem (the AWS C++ SDK with Expect: 100-continue genuinely trips it), and replacing available() with a one-byte probe is the right remedy. available() only promises bytes readable without blocking, so an unbuffered chunked body legitimately reports 0. One detail worth calling out as correct: super.readFrom is handed the PushbackInputStream rather than the original stream, so the consumed byte is replayed into the SAX parser — easy to get wrong, good that it's threaded through.
I traced the change end-to-end and have no concerns about the production code: the empty-body contract is preserved (-1 → same 400 "You must specify at least one part"), the new blocking read is bounded by the HTTP idle timeout and only reachable after SigV4 auth, and the CompleteMultipartUpload body isn't signature-bound, so allowing chunked/unsigned bodies opens no auth bypass.
The suggestions below are all test-hardening, not blockers:
Unit tests
- The fix intentionally lets non-empty bodies through, which moves empty/malformed-parts rejection downstream — but that newly-reachable path has no unit coverage. The highest-value addition is a case for a well-formed body with zero
<Part>elements (<CompleteMultipartUpload></CompleteMultipartUpload>), which is exactly the shape a chunked SDK could send: assert the unmarshaller returns an empty part list, documenting that empty-list rejection is deferred to the endpoint. A malformed-XML case assertingWebApplicationExceptionwould also lock down the new non-empty-but-unparseable path. checkContentasserts the part count and both ETags but notPartNumber; adding e.g.assertEquals(1, partList.get(0).getPartNumber())would catch a number drop/swap.
Acceptance test (MultipartUpload.robot)
Should Contain ${result} ${key}doesn't distinguish success from failure — the key also appears in the<Resource>element of the S3 error response. The real discriminators here areShould Contain ETagandShould Not Contain must specify at least one part; I'd drop the${key}check or pin it to the success<Key>element.[Teardown]removes the local XML but doesn't abort the initiated MPU, so a failed assertion leaks an in-progress upload. Consider addingRun Keyword And Ignore Error Abort MPU.- Forcing chunked via
-H "Transfer-Encoding: chunked" -H "Content-Length:" --data-binaryis curl-build-dependent; if curl re-addsContent-Lengththe test still passes but stops exercising the chunked path. Might be worth confirming (e.g. viacurl -v/ server log) that the request actually went out chunked, so the regression test can't silently degrade.
Thanks for the fix and the regression coverage.
There was a problem hiding this comment.
Pull request overview
Fixes S3 Gateway multipart upload completion failures when the client sends the CompleteMultipartUpload XML using chunked transfer encoding (no Content-Length), by avoiding the incorrect use of InputStream#available() to detect an empty body.
Changes:
- Replace
InputStream#available() == 0empty-body detection with aPushbackInputStreamone-byte read/unread probe. - Add unit tests covering the “available() returns 0 but body is non-empty” regression and the truly-empty-body error case.
- Add a smoketest/acceptance test that completes an MPU via a presigned URL while forcing chunked transfer encoding.
Reviewed changes
Copilot reviewed 2 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/CompleteMultipartUploadRequestUnmarshaller.java |
Fix empty-body detection for chunked request bodies by probing with PushbackInputStream instead of available(). |
hadoop-ozone/s3gateway/src/test/java/org/apache/hadoop/ozone/s3/endpoint/TestCompleteMultipartUploadRequestUnmarshaller.java |
Add unit coverage for the chunked/available==0 regression and empty-body behavior. |
hadoop-ozone/dist/src/main/smoketest/s3/presigned_url_helper.py |
Add helper to generate a presigned URL for CompleteMultipartUpload for smoketests. |
hadoop-ozone/dist/src/main/smoketest/s3/MultipartUpload.robot |
Add an acceptance test completing MPU with chunked transfer encoding via presigned URL. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assertTrue(ex.getMessage().contains("must specify at least one part"), | ||
| "Unexpected message: " + ex.getMessage()); |
- Add unit coverage for a well-formed body with zero <Part> elements. - Use the success-only CompleteMultipartUploadResult/ETag markers (not the key, which also appears in an error <Resource>) to assert completion. - Abort the initiated MPU in teardown so a failed assertion does not leak it.
… text The empty-body test asserted on the literal error message, duplicating the production string. Assert on the stable S3ErrorTable.INVALID_REQUEST code (read from the OS3Exception cause) instead.
What changes were proposed in this pull request?
Completing a multipart upload with chunked transfer encoding (no
Content-Length, as the AWS C++ SDK does withExpect: 100-continue) failswith HTTP 400
You must specify at least one part, even though the body doescontain the parts.
CompleteMultipartUploadRequestUnmarshallerusedInputStream#available() == 0to detect an empty body.
available()reports only the bytes readablewithout blocking, so for a not-yet-buffered chunked body it can return
0even when the body is non-empty, causing the spurious error.
The fix replaces
available()with aPushbackInputStreamthat reads onebyte:
-1means the body is truly empty (original error preserved), otherwisethe byte is pushed back and parsing continues. This is the only use of
available()in S3 Gateway, so no other request type is affected.What is the link to the Apache JIRA
https://issues.apache.org/jira/projects/HDDS/issues/HDDS-14760
How was this patch tested?
TestCompleteMultipartUploadRequestUnmarshaller:fromStreamWhereAvailableReturnsZero(reproduces the bug; fails before thefix) and
emptyBodyThrowsMustSpecifyAtLeastOnePart(empty body stillrejected). Existing unmarshaller and multipart tests pass.
MultipartUpload.robotcompleting an MPU via apresigned URL with
Transfer-Encoding: chunked.