feat: support auto-resolved template model IDs#216
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to resolve template model IDs dynamically by querying server capabilities. It adds a utility function resolve_template_model_id and integrates it into client template setup methods, while also updating the server launcher to merge configured supported models with those derived from application configs. Feedback highlights a bug where self.model_id retains the twinkle:// prefix in MultiLoraTransformersModel, and suggests improvements such as using model_dump() or dict() for Pydantic model normalization, calling raise_for_status() before parsing JSON responses, and adding exception handling to gracefully handle unreachable servers.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if hasattr(item, 'model_name'): | ||
| model_name = getattr(item, 'model_name', None) | ||
| if not model_name: | ||
| return None | ||
| template_init_model_id = getattr(item, 'template_init_model_id', None) | ||
| return { | ||
| 'model_name': model_name, | ||
| 'template_init_model_id': template_init_model_id, | ||
| } |
There was a problem hiding this comment.
When normalizing a supported model item, if the item is a Pydantic model (which has the model_name attribute), extracting only model_name and template_init_model_id will silently drop any other fields that might be present on the model. To ensure extensibility and preserve all fields, we should check if the item has a model_dump or dict method and use that to serialize it to a dictionary.
| if hasattr(item, 'model_name'): | |
| model_name = getattr(item, 'model_name', None) | |
| if not model_name: | |
| return None | |
| template_init_model_id = getattr(item, 'template_init_model_id', None) | |
| return { | |
| 'model_name': model_name, | |
| 'template_init_model_id': template_init_model_id, | |
| } | |
| if hasattr(item, 'model_name'): | |
| if hasattr(item, 'model_dump'): | |
| return item.model_dump() | |
| if hasattr(item, 'dict'): | |
| return item.dict() | |
| model_name = getattr(item, 'model_name', None) | |
| if not model_name: | |
| return None | |
| template_init_model_id = getattr(item, 'template_init_model_id', None) | |
| return { | |
| 'model_name': model_name, | |
| 'template_init_model_id': template_init_model_id, | |
| } |
| response = http_get(f'{base_url}/twinkle/get_server_capabilities') | ||
| _SERVER_CAPABILITIES_CACHE[base_url] = GetServerCapabilitiesResponse(**response.json()) |
There was a problem hiding this comment.
The HTTP response from http_get is parsed directly via response.json() without calling response.raise_for_status(). If the request fails (e.g., due to a 404 or 500 error), this can lead to unhelpful or cryptic errors (like JSONDecodeError or ValidationError). Calling raise_for_status() first ensures that HTTP errors are surfaced clearly.
| response = http_get(f'{base_url}/twinkle/get_server_capabilities') | |
| _SERVER_CAPABILITIES_CACHE[base_url] = GetServerCapabilitiesResponse(**response.json()) | |
| response = http_get(f'{base_url}/twinkle/get_server_capabilities') | |
| response.raise_for_status() | |
| _SERVER_CAPABILITIES_CACHE[base_url] = GetServerCapabilitiesResponse(**response.json()) |
| supported_model = get_supported_model_by_name(model_name) | ||
| if supported_model and supported_model.template_init_model_id: | ||
| return supported_model.template_init_model_id |
There was a problem hiding this comment.
If the server is temporarily unreachable or running an older version that does not support the /twinkle/get_server_capabilities endpoint, get_supported_model_by_name will raise an exception and crash the client's set_template call. Wrapping this in a try-except block ensures defensive programming and allows the client to gracefully fall back to using model_name directly.
| supported_model = get_supported_model_by_name(model_name) | |
| if supported_model and supported_model.template_init_model_id: | |
| return supported_model.template_init_model_id | |
| try: | |
| supported_model = get_supported_model_by_name(model_name) | |
| if supported_model and supported_model.template_init_model_id: | |
| return supported_model.template_init_model_id | |
| except Exception: | |
| pass |
|
Hi @xichengpro, Thanks — the capability-declaration flow is a nice cleanup, and the Heads up: #210 is a parallel server-config refactor that reshapes the launcher ( Since the launcher helpers here would need to be rewritten against the typed config, we'll fold the intent into #210 directly with |
Sampler's set_template now auto-injects self.model_id when the caller omits it, consistent with how the model backend already overrides model_id with self.tokenizer_id. This eliminates the need for clients to manually resolve the HF model ID when the route name differs from the underlying model. Also fixes two client-generator bugs: model client assigned self.model_id before stripping the twinkle:// prefix, and sampler client was missing self.model_id entirely. Co-authored-by: xichengpro <188454548+xichengpro@users.noreply.github.com>
PR type
PR information
Write the detail information belongs to this PR.
template_init_model_idtoSupportedModelto support configuring the target model ID used for template initializationtemplate_model_idhelper module, preferring an explicit model ID and otherwise resolving it from server capabilitiesMultiLoraTransformersModel.set_templateandvLLMSampler.set_templateto replace model ID passing with the new resolution flowExperiment results
Paste your experiment result here(if needed).