diff --git a/docs/faq/faq.md b/docs/faq/faq.md index 1a700804..b3f255e1 100644 --- a/docs/faq/faq.md +++ b/docs/faq/faq.md @@ -181,3 +181,54 @@ If the official Device Plugin cannot provide the required information, HAMi deve - Ascend’s official Device Plugin requires a separate plugin for each card type. HAMi abstracts these card templates into a unified plugin for easier integration with the scheduler. - NVIDIA requires custom implementations to support advanced features like compute and memory limits, overcommitment, and NUMA awareness, necessitating HAMi’s custom Device Plugin. + +## How does HAMi enforce GPU memory and compute limits? + +HAMi injects `libvgpu.so` into containers via `/etc/ld.so.preload`. The library intercepts CUDA memory allocation calls and returns OOM when the `nvidia.com/gpumem` limit is exceeded; compute limits use a token-bucket throttle on kernel launch calls. Applications that bypass the CUDA library (Docker-in-Docker, direct driver API) are not covered. For the full interception flow, see [GPU Virtualization](./core-concepts/gpu-virtualization). + +## How does HAMi vGPU differ from NVIDIA MIG? When should I use each? + +HAMi vGPU is software-only with no hardware requirements. NVIDIA MIG is hardware partitioning available only on Ampere and later GPUs (A100, H100, A30). + +| Property | HAMi vGPU | NVIDIA MIG | +|---|---|---| +| Hardware requirement | Any NVIDIA GPU, driver v440+ | Ampere or later (A100, H100, A30, H200) | +| Isolation mechanism | User-space library interception | Hardware engine partitioning | +| Memory enforcement | Soft (CUDA API level) | Hard (hardware-enforced) | +| Compute enforcement | Soft (throttle inside libvgpu.so) | Hard (separate SM partitions) | +| Partition granularity | 1 MiB memory, 1% compute | Fixed MIG profiles (e.g. 1g.10gb) | +| Dynamic reconfiguration | Yes, no node drain needed | Requires MIG profile reconfiguration | +| Multi-tenant noise isolation | Best-effort | Strong | + +Use HAMi vGPU when the GPU does not support MIG, workloads need flexible memory sizes, or dynamic repacking without node drains is needed. Use MIG when hard hardware isolation is a compliance or SLA requirement. HAMi also supports dynamic MIG via `mig-parted`; see [Dynamic MIG Support](./userguide/nvidia-device/dynamic-mig-support). + +## Why does nvidia-smi inside my container show less memory than on the host? + +`libvgpu.so` intercepts `nvmlDeviceGetMemoryInfo` and related calls, returning the `nvidia.com/gpumem` limit instead of physical VRAM. This is intentional: workloads that size their allocations based on reported memory (such as vLLM) will use only their budget. The host’s `nvidia-smi` always shows physical memory. See [GPU Virtualization](./core-concepts/gpu-virtualization). + +## Why is my nvidia.com/gpumem limit not enforced? {#why-is-my-nvidiagpumem-limit-not-enforced} + +The four most common causes: `CUDA_DISABLE_CONTROL=true` is set, the workload runs inside Docker-in-Docker, the application calls the GPU driver directly (bypassing `libvgpu.so`), or `nvidia-container-runtime` is not the default runtime on the node. See [Troubleshooting](./troubleshooting) for resolution steps. + +## Does HAMi replace kube-scheduler or run alongside it? + +HAMi runs alongside kube-scheduler as a [scheduler extender](https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/scheduler_extender.md) - it does not replace it. The MutatingWebhook sets `schedulerName: hami-scheduler` only on pods requesting HAMi resources; all other pods follow the default scheduler path unchanged. See [Architecture](./core-concepts/architecture). + +## Does HAMi work with vLLM, and what are the known limitations for multi-GPU tensor parallelism? + +Single-GPU vLLM with `nvidia.com/gpumem` works without configuration. For multi-GPU tensor parallelism (`tensor_parallel_size > 1`) with vLLM versions greater than 0.18, HAMi v2.9.0 or later is required. Earlier versions had NCCL initialization failures due to shared CUDA device memory state files (see [#1764](https://github.com/Project-HAMi/HAMi/issues/1764) and [#1853](https://github.com/Project-HAMi/HAMi/issues/1853)). In Volcano environments, set `tensor_parallel_size` per pod, not across all pods. If CUDA graph capture errors occur, try `--enforce-eager`. + +## Is HAMi compatible with NVIDIA GPU Operator and DCGM metrics? + +HAMi’s device plugin and GPU Operator’s device plugin both report `nvidia.com/gpu` to kubelet - running both on the same node causes conflicts. Disable the GPU Operator device plugin: + +```yaml +devicePlugin: + enabled: false +``` + +DCGM Exporter is not affected and continues to report physical-level counters normally. HAMi’s per-container virtual metrics are separate; see [GPU Utilization Metrics](./developers/gpu-utilization-metrics). + +## How do I set up Prometheus and Grafana monitoring for HAMi vGPU metrics? + +The `hami-device-plugin` pod on each node exposes per-container vGPU metrics on port `31992` (configurable via `devicePlugin.monitorPort`). See [Grafana Dashboard](./userguide/monitoring/grafana-dashboard) for the full setup including Prometheus scrape config and dashboard import. diff --git a/docs/troubleshooting/troubleshooting.md b/docs/troubleshooting/troubleshooting.md index 7b1ad5c5..74d789ba 100644 --- a/docs/troubleshooting/troubleshooting.md +++ b/docs/troubleshooting/troubleshooting.md @@ -2,6 +2,21 @@ title: Troubleshooting --- +## GPU Memory Limit Not Enforced {#gpu-memory-limit-not-enforced} + +If a container exceeds its `nvidia.com/gpumem` limit, check the following causes: + +- **`CUDA_DISABLE_CONTROL=true` is set** - disables HAMi-core enforcement entirely. Remove it from production workloads. +- **Docker-in-Docker (DinD)** - inner containers do not inherit the `/etc/ld.so.preload` hostPath mount. HAMi enforcement does not apply inside DinD. +- **Direct driver API usage** - workloads calling NVML or the CUDA Driver API directly bypass `libvgpu.so`. +- **`nvidia-container-runtime` not set as default** - verify with: + + ```bash + containerd config dump | grep default_runtime_name + ``` + + The output must show `nvidia`. If not, follow the [Prerequisites](./installation/online-installation) guide. + - If you don’t explicitly request vGPUs when using the device plugin with NVIDIA images, all GPUs on the host may be exposed to your container. - Currently, A100 MIG can be supported in only "none" and "mixed" modes. - Tasks with the "nodeName" field cannot be scheduled at the moment; please use "nodeSelector" instead. diff --git a/docs/userguide/monitoring/grafana-dashboard.md b/docs/userguide/monitoring/grafana-dashboard.md index 9a85f86a..8617479a 100644 --- a/docs/userguide/monitoring/grafana-dashboard.md +++ b/docs/userguide/monitoring/grafana-dashboard.md @@ -25,6 +25,28 @@ The dashboard includes panels for: - Node-level GPU resource availability - Device plugin health status +## Prometheus Scrape Config + +The `hami-device-plugin` pod on each node exposes metrics on port `31992` (configurable via `devicePlugin.monitorPort`). Add a scrape job: + +```yaml +scrape_configs: + - job_name: hami-device-plugin + static_configs: + - targets: + - :31992 +``` + +For Prometheus Operator, create a `ServiceMonitor` targeting the `hami-device-plugin` service on port `31992`. + +Key metrics: + +| Metric | Description | +|---|---| +| `Device_memory_desc_of_container` | Virtual GPU memory allocated to a container | +| `Device_utilization_desc_of_container` | GPU compute utilization per container | +| `Device_memory_limit_of_container` | Memory limit set for the container | + ## Prerequisites - Prometheus is installed and scraping the HAMi device plugin metrics endpoint.