fix(tgi): honor S3 model_path as weight source for TGI builds (#5943) by sagarneeldubey · Pull Request #5964 · aws/sagemaker-python-sdk

sagarneeldubey · 2026-06-21T13:12:38Z

Issue

Summary

ModelBuilder with ModelServer.TGI silently ignored an S3 weight source supplied via model_path="s3://..." or s3_model_data_url="s3://...". It created a literal local s3:/... directory and set HF_MODEL_ID to the HF repo id, so the deployed container always downloaded weights from huggingface.co. This forces a ~10-12 min cold start on every scale-out, makes scale-to-zero async endpoints impractical, creates a hard dependency on huggingface.co, and blocks deploying custom fine-tuned weights not published on the Hub.

_build_for_tgi now:

Detects an S3 weight source before any local directory is created and skips the local mkdir for it.
Attaches the S3 prefix as an uncompressed ModelDataSource (S3DataType=S3Prefix, CompressionType=None) by routing through _prepare_for_mode(model_path=...).
Sets HF_MODEL_ID=/opt/ml/model and HF_HUB_OFFLINE=1 via setdefault, preserving any user-supplied HF_MODEL_ID.

Because TGI's HF_MODEL_ID does not accept an S3 URI (it expects an HF repo id or a local path), the fix mounts the weights at /opt/ml/model rather than passing the URI through the env var.

Genuine local paths, HF-Hub downloads, JumpStart, and all non-TGI servers are unchanged. The change is scoped strictly to the TGI path and does not touch the DJL behavior of #5529 / #5588.

Additional defects fixed (surfaced during real deployment)

HF_HUB_OFFLINE reset: the end-of-build reset set it back to "0", defeating offline loading. It now stays "1" for the S3-mounted path so TGI loads from /opt/ml/model instead of phoning home.
Doubled trailing slash: the ModelDataSource S3Uri could become s3://.../prefix// when the input prefix already ended in /. With S3Prefix matching, no objects match ...prefix//, so zero files mount into /opt/ml/model and TGI fails to find weights. Normalized to exactly one trailing slash.

Testing

New TestBuildForTGI cases: S3 model_path skips _create_dir_structure; container gets HF_MODEL_ID=/opt/ml/model + HF_HUB_OFFLINE=1; s3_model_data_url routes through the S3 branch; user-supplied HF_MODEL_ID preserved; HF_HUB_OFFLINE survives the post-build reset.
Preservation tests: local-path, HF-Hub, non-TGI (DJL/TEI), and JumpStart behavior unchanged.
test_tgi_server.py: regression test asserting the S3Uri has exactly one trailing slash (no //).
All affected unit tests pass (21 passed). Changed files are black/flake8 clean with zero new docstyle/pylint findings vs. baseline.

End-to-end validation (real SageMaker inference endpoint, TGI DLC, weights from S3)

Beyond unit tests, the fix was validated against a live SageMaker inference endpoint using the TGI Deep Learning Container, loading model weights directly from an S3 prefix (no HuggingFace download). The build step was sanity-checked to confirm the container config before deploying, and the endpoint then mounted the weights from S3 and served successfully:

schema_builder = SchemaBuilder(
    sample_input={"inputs": "What is deep learning?", "parameters": {"max_new_tokens": 64}},
    sample_output=[{"generated_text": "Deep learning is..."}],
)

builder = ModelBuilder(
    model=args.model_id,
    model_path=s3_uri,                 # the #5943 fix: S3 weights, no HF download
    model_server=ModelServer.TGI,
    schema_builder=schema_builder,
    env_vars=build_env(args),
    instance_type=args.instance_type,
    role_arn=args.role,
    sagemaker_session=sm_session,
)

print("\nBuilding model (patched ModelBuilder)...")
builder.build()

# Sanity-check the fix produced the right container config before deploying.
env = builder.env_vars or {}
junk = Path(s3_uri)
print(f"  no junk dir: {not junk.exists()}  |  HF_MODEL_ID={env.get('HF_MODEL_ID')}  |  HF_HUB_OFFLINE={env.get('HF_HUB_OFFLINE')}")

print(f"\nDeploying endpoint '{args.endpoint_name}' (should mount weights from S3)...")
start = time.time()
endpoint = builder.deploy(
    endpoint_name=args.endpoint_name,
    initial_instance_count=1,
    instance_type=args.instance_type,
    container_startup_health_check_timeout=args.startup_timeout,
)

Observed: no local s3:/... directory was created; the built container had HF_MODEL_ID=/opt/ml/model, HF_HUB_OFFLINE=1, and a ModelDataSource pointing at the S3 prefix; the endpoint reached InService and served inference with weights mounted from S3 (CloudWatch shows the container loading from the mounted path rather than downloading from huggingface.co).

Future work (out of scope for this PR)

This PR is intentionally scoped to the TGI backend (the subject of #5943). The same "S3 as a weight source" capability should be extended to the other HF-Hub-download backends so behavior is consistent across ModelBuilder (see https://sagemaker.readthedocs.io/en/stable/ ):

TEI — nearly identical to the TGI path; smallest follow-up.
DJL/LMI — needs a decision on mechanism, since (unlike TGI) DJL's HF_MODEL_ID/OPTION_MODEL_ID can accept an S3 URI directly. Must avoid regressing the DJL env-var handling in bug: ModelBuilder overwrites user-provided HF_MODEL_ID for DJL Serving, preventing S3 model loading #5529 and the DJL passthrough in ModelBuilder fails w/ DJL container using HF_MODEL_ID #5588.
A shared helper for S3 weight-source detection + the uncompressed ModelDataSource construction (the trailing-slash normalization currently exists per-adapter) to avoid duplicating this logic across backends.

Related: #5529, #5588.

Backward compatibility

Additive and TGI-scoped. Every non-S3 / non-TGI code path is reached exactly as before.

ModelBuilder with ModelServer.TGI silently ignored an S3 weight source (model_path="s3://..." or s3_model_data_url="s3://..."): it created a literal local "s3:/..." directory and set HF_MODEL_ID to the HF repo id, so the container always downloaded weights from huggingface.co. _build_for_tgi now detects an S3 weight source, skips the local mkdir for it, attaches the S3 prefix as an uncompressed ModelDataSource, and sets HF_MODEL_ID=/opt/ml/model and HF_HUB_OFFLINE=1 (via setdefault, preserving any user-supplied HF_MODEL_ID). Genuine local paths, HF-Hub downloads, JumpStart, and all non-TGI servers are unchanged. Also fixes two defects surfaced during real deployment: - HF_HUB_OFFLINE was reset to "0" at the end of the build; it now stays "1" for the S3-mounted path so TGI loads from /opt/ml/model. - The uncompressed ModelDataSource S3Uri could end in "//" when the input prefix already had a trailing slash; normalized to exactly one slash so S3Prefix matching finds the weight objects. Adds unit and regression tests for all of the above.

sagarneeldubey requested a deployment to manual-approval June 21, 2026 13:12 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tgi): honor S3 model_path as weight source for TGI builds (#5943)#5964

fix(tgi): honor S3 model_path as weight source for TGI builds (#5943)#5964
sagarneeldubey wants to merge 1 commit into
aws:masterfrom
sagarneeldubey:tgi-s3-model-loading

sagarneeldubey commented Jun 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sagarneeldubey commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Summary

Additional defects fixed (surfaced during real deployment)

Testing

End-to-end validation (real SageMaker inference endpoint, TGI DLC, weights from S3)

Future work (out of scope for this PR)

Backward compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sagarneeldubey commented Jun 21, 2026 •

edited

Loading