Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions docs/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,54 @@ If the official Device Plugin cannot provide the required information, HAMi deve

- Ascend’s official Device Plugin requires a separate plugin for each card type. HAMi abstracts these card templates into a unified plugin for easier integration with the scheduler.
- NVIDIA requires custom implementations to support advanced features like compute and memory limits, overcommitment, and NUMA awareness, necessitating HAMi’s custom Device Plugin.

## How does HAMi enforce GPU memory and compute limits?

HAMi injects `libvgpu.so` into containers via `/etc/ld.so.preload`. The library intercepts CUDA memory allocation calls and returns OOM when the `nvidia.com/gpumem` limit is exceeded; compute limits use a token-bucket throttle on kernel launch calls. Applications that bypass the CUDA library (Docker-in-Docker, direct driver API) are not covered. For the full interception flow, see [GPU Virtualization](./core-concepts/gpu-virtualization).

## How does HAMi vGPU differ from NVIDIA MIG? When should I use each?

HAMi vGPU is software-only with no hardware requirements. NVIDIA MIG is hardware partitioning available only on Ampere and later GPUs (A100, H100, A30).

| Property | HAMi vGPU | NVIDIA MIG |
|---|---|---|
| Hardware requirement | Any NVIDIA GPU, driver v440+ | Ampere or later (A100, H100, A30, H200) |
| Isolation mechanism | User-space library interception | Hardware engine partitioning |
| Memory enforcement | Soft (CUDA API level) | Hard (hardware-enforced) |
| Compute enforcement | Soft (throttle inside libvgpu.so) | Hard (separate SM partitions) |
| Partition granularity | 1 MiB memory, 1% compute | Fixed MIG profiles (e.g. 1g.10gb) |
| Dynamic reconfiguration | Yes, no node drain needed | Requires MIG profile reconfiguration |
| Multi-tenant noise isolation | Best-effort | Strong |

Use HAMi vGPU when the GPU does not support MIG, workloads need flexible memory sizes, or dynamic repacking without node drains is needed. Use MIG when hard hardware isolation is a compliance or SLA requirement. HAMi also supports dynamic MIG via `mig-parted`; see [Dynamic MIG Support](./userguide/nvidia-device/dynamic-mig-support).

## Why does nvidia-smi inside my container show less memory than on the host?

`libvgpu.so` intercepts `nvmlDeviceGetMemoryInfo` and related calls, returning the `nvidia.com/gpumem` limit instead of physical VRAM. This is intentional: workloads that size their allocations based on reported memory (such as vLLM) will use only their budget. The host’s `nvidia-smi` always shows physical memory. See [GPU Virtualization](./core-concepts/gpu-virtualization).

## Why is my nvidia.com/gpumem limit not enforced? {#why-is-my-nvidiagpumem-limit-not-enforced}

The four most common causes: `CUDA_DISABLE_CONTROL=true` is set, the workload runs inside Docker-in-Docker, the application calls the GPU driver directly (bypassing `libvgpu.so`), or `nvidia-container-runtime` is not the default runtime on the node. See [Troubleshooting](./troubleshooting) for resolution steps.

## Does HAMi replace kube-scheduler or run alongside it?

HAMi runs alongside kube-scheduler as a [scheduler extender](https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/scheduler_extender.md) - it does not replace it. The MutatingWebhook sets `schedulerName: hami-scheduler` only on pods requesting HAMi resources; all other pods follow the default scheduler path unchanged. See [Architecture](./core-concepts/architecture).

## Does HAMi work with vLLM, and what are the known limitations for multi-GPU tensor parallelism?

Single-GPU vLLM with `nvidia.com/gpumem` works without configuration. For multi-GPU tensor parallelism (`tensor_parallel_size > 1`) with vLLM versions greater than 0.18, HAMi v2.9.0 or later is required. Earlier versions had NCCL initialization failures due to shared CUDA device memory state files (see [#1764](https://github.com/Project-HAMi/HAMi/issues/1764) and [#1853](https://github.com/Project-HAMi/HAMi/issues/1853)). In Volcano environments, set `tensor_parallel_size` per pod, not across all pods. If CUDA graph capture errors occur, try `--enforce-eager`.

## Is HAMi compatible with NVIDIA GPU Operator and DCGM metrics?

HAMi’s device plugin and GPU Operator’s device plugin both report `nvidia.com/gpu` to kubelet - running both on the same node causes conflicts. Disable the GPU Operator device plugin:

```yaml
devicePlugin:
enabled: false
```

DCGM Exporter is not affected and continues to report physical-level counters normally. HAMi’s per-container virtual metrics are separate; see [GPU Utilization Metrics](./developers/gpu-utilization-metrics).

## How do I set up Prometheus and Grafana monitoring for HAMi vGPU metrics?

The `hami-device-plugin` pod on each node exposes per-container vGPU metrics on port `31992` (configurable via `devicePlugin.monitorPort`). See [Grafana Dashboard](./userguide/monitoring/grafana-dashboard) for the full setup including Prometheus scrape config and dashboard import.
15 changes: 15 additions & 0 deletions docs/troubleshooting/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@
title: Troubleshooting
---

## GPU Memory Limit Not Enforced {#gpu-memory-limit-not-enforced}

If a container exceeds its `nvidia.com/gpumem` limit, check the following causes:

- **`CUDA_DISABLE_CONTROL=true` is set** - disables HAMi-core enforcement entirely. Remove it from production workloads.
- **Docker-in-Docker (DinD)** - inner containers do not inherit the `/etc/ld.so.preload` hostPath mount. HAMi enforcement does not apply inside DinD.
- **Direct driver API usage** - workloads calling NVML or the CUDA Driver API directly bypass `libvgpu.so`.
- **`nvidia-container-runtime` not set as default** - verify with:

```bash
containerd config dump | grep default_runtime_name
```

The output must show `nvidia`. If not, follow the [Prerequisites](./installation/online-installation) guide.

- If you don’t explicitly request vGPUs when using the device plugin with NVIDIA images, all GPUs on the host may be exposed to your container.
- Currently, A100 MIG can be supported in only "none" and "mixed" modes.
- Tasks with the "nodeName" field cannot be scheduled at the moment; please use "nodeSelector" instead.
Expand Down
22 changes: 22 additions & 0 deletions docs/userguide/monitoring/grafana-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,28 @@ The dashboard includes panels for:
- Node-level GPU resource availability
- Device plugin health status

## Prometheus Scrape Config

The `hami-device-plugin` pod on each node exposes metrics on port `31992` (configurable via `devicePlugin.monitorPort`). Add a scrape job:

```yaml
scrape_configs:
- job_name: hami-device-plugin
static_configs:
- targets:
- <node-ip>:31992
```

For Prometheus Operator, create a `ServiceMonitor` targeting the `hami-device-plugin` service on port `31992`.

Key metrics:

| Metric | Description |
|---|---|
| `Device_memory_desc_of_container` | Virtual GPU memory allocated to a container |
| `Device_utilization_desc_of_container` | GPU compute utilization per container |
| `Device_memory_limit_of_container` | Memory limit set for the container |

## Prerequisites

- Prometheus is installed and scraping the HAMi device plugin metrics endpoint.
Expand Down
Loading