Skip to content

fix(tgi): honor S3 model_path as weight source for TGI builds (#5943)#5964

Open
sagarneeldubey wants to merge 1 commit into
aws:masterfrom
sagarneeldubey:tgi-s3-model-loading
Open

fix(tgi): honor S3 model_path as weight source for TGI builds (#5943)#5964
sagarneeldubey wants to merge 1 commit into
aws:masterfrom
sagarneeldubey:tgi-s3-model-loading

Conversation

@sagarneeldubey

@sagarneeldubey sagarneeldubey commented Jun 21, 2026

Copy link
Copy Markdown

Issue

Fixes #5943

Summary

ModelBuilder with ModelServer.TGI silently ignored an S3 weight source supplied via model_path="s3://..." or s3_model_data_url="s3://...". It created a literal local s3:/... directory and set HF_MODEL_ID to the HF repo id, so the deployed container always downloaded weights from huggingface.co. This forces a ~10-12 min cold start on every scale-out, makes scale-to-zero async endpoints impractical, creates a hard dependency on huggingface.co, and blocks deploying custom fine-tuned weights not published on the Hub.

_build_for_tgi now:

  • Detects an S3 weight source before any local directory is created and skips the local mkdir for it.
  • Attaches the S3 prefix as an uncompressed ModelDataSource (S3DataType=S3Prefix, CompressionType=None) by routing through _prepare_for_mode(model_path=...).
  • Sets HF_MODEL_ID=/opt/ml/model and HF_HUB_OFFLINE=1 via setdefault, preserving any user-supplied HF_MODEL_ID.

Because TGI's HF_MODEL_ID does not accept an S3 URI (it expects an HF repo id or a local path), the fix mounts the weights at /opt/ml/model rather than passing the URI through the env var.

Genuine local paths, HF-Hub downloads, JumpStart, and all non-TGI servers are unchanged. The change is scoped strictly to the TGI path and does not touch the DJL behavior of #5529 / #5588.

Additional defects fixed (surfaced during real deployment)

  • HF_HUB_OFFLINE reset: the end-of-build reset set it back to "0", defeating offline loading. It now stays "1" for the S3-mounted path so TGI loads from /opt/ml/model instead of phoning home.
  • Doubled trailing slash: the ModelDataSource S3Uri could become s3://.../prefix// when the input prefix already ended in /. With S3Prefix matching, no objects match ...prefix//, so zero files mount into /opt/ml/model and TGI fails to find weights. Normalized to exactly one trailing slash.

Testing

  • New TestBuildForTGI cases: S3 model_path skips _create_dir_structure; container gets HF_MODEL_ID=/opt/ml/model + HF_HUB_OFFLINE=1; s3_model_data_url routes through the S3 branch; user-supplied HF_MODEL_ID preserved; HF_HUB_OFFLINE survives the post-build reset.
  • Preservation tests: local-path, HF-Hub, non-TGI (DJL/TEI), and JumpStart behavior unchanged.
  • test_tgi_server.py: regression test asserting the S3Uri has exactly one trailing slash (no //).
  • All affected unit tests pass (21 passed). Changed files are black/flake8 clean with zero new docstyle/pylint findings vs. baseline.

End-to-end validation (real SageMaker inference endpoint, TGI DLC, weights from S3)

Beyond unit tests, the fix was validated against a live SageMaker inference endpoint using the TGI Deep Learning Container, loading model weights directly from an S3 prefix (no HuggingFace download). The build step was sanity-checked to confirm the container config before deploying, and the endpoint then mounted the weights from S3 and served successfully:

schema_builder = SchemaBuilder(
    sample_input={"inputs": "What is deep learning?", "parameters": {"max_new_tokens": 64}},
    sample_output=[{"generated_text": "Deep learning is..."}],
)

builder = ModelBuilder(
    model=args.model_id,
    model_path=s3_uri,                 # the #5943 fix: S3 weights, no HF download
    model_server=ModelServer.TGI,
    schema_builder=schema_builder,
    env_vars=build_env(args),
    instance_type=args.instance_type,
    role_arn=args.role,
    sagemaker_session=sm_session,
)

print("\nBuilding model (patched ModelBuilder)...")
builder.build()

# Sanity-check the fix produced the right container config before deploying.
env = builder.env_vars or {}
junk = Path(s3_uri)
print(f"  no junk dir: {not junk.exists()}  |  HF_MODEL_ID={env.get('HF_MODEL_ID')}  |  HF_HUB_OFFLINE={env.get('HF_HUB_OFFLINE')}")

print(f"\nDeploying endpoint '{args.endpoint_name}' (should mount weights from S3)...")
start = time.time()
endpoint = builder.deploy(
    endpoint_name=args.endpoint_name,
    initial_instance_count=1,
    instance_type=args.instance_type,
    container_startup_health_check_timeout=args.startup_timeout,
)

Observed: no local s3:/... directory was created; the built container had HF_MODEL_ID=/opt/ml/model, HF_HUB_OFFLINE=1, and a ModelDataSource pointing at the S3 prefix; the endpoint reached InService and served inference with weights mounted from S3 (CloudWatch shows the container loading from the mounted path rather than downloading from huggingface.co).

Future work (out of scope for this PR)

This PR is intentionally scoped to the TGI backend (the subject of #5943). The same "S3 as a weight source" capability should be extended to the other HF-Hub-download backends so behavior is consistent across ModelBuilder (see https://sagemaker.readthedocs.io/en/stable/ ):

Related: #5529, #5588.

Backward compatibility

Additive and TGI-scoped. Every non-S3 / non-TGI code path is reached exactly as before.

ModelBuilder with ModelServer.TGI silently ignored an S3 weight source
(model_path="s3://..." or s3_model_data_url="s3://..."): it created a
literal local "s3:/..." directory and set HF_MODEL_ID to the HF repo id,
so the container always downloaded weights from huggingface.co.

_build_for_tgi now detects an S3 weight source, skips the local mkdir for
it, attaches the S3 prefix as an uncompressed ModelDataSource, and sets
HF_MODEL_ID=/opt/ml/model and HF_HUB_OFFLINE=1 (via setdefault, preserving
any user-supplied HF_MODEL_ID). Genuine local paths, HF-Hub downloads,
JumpStart, and all non-TGI servers are unchanged.

Also fixes two defects surfaced during real deployment:
- HF_HUB_OFFLINE was reset to "0" at the end of the build; it now stays
  "1" for the S3-mounted path so TGI loads from /opt/ml/model.
- The uncompressed ModelDataSource S3Uri could end in "//" when the input
  prefix already had a trailing slash; normalized to exactly one slash so
  S3Prefix matching finds the weight objects.

Adds unit and regression tests for all of the above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: ModelBuilder TGI path ignores S3 model inputs (model_path and s3_model_data_url), forcing HuggingFace Hub download

1 participant