How-to Guides
GPU Monitoring
Monitor NVIDIA GPU utilization, memory, and costs across your Kubernetes clusters.
If DCGM Exporter is running in your cluster, Kubeadapt collects GPU metrics with zero configuration. The agent discovers it automatically.
Supported GPUs
| GPU Vendor | Exporter Required | Status |
|---|---|---|
| NVIDIA (A100, H100, V100, T4, L4, P100, and any DCGM-compatible GPU) | DCGM Exporter | Supported |
| AMD (MI250, MI300) | - | Not supported |
| Intel (Gaudi, Flex) | - | Not supported |
Prerequisites
- NVIDIA GPUs in your cluster nodes
- DCGM Exporter running as a DaemonSet, typically installed via GPU Operator or standalone
- Kubeadapt agent installed (Quick Start)
Check the Dashboard
Sign in to app.kubeadapt.io and navigate to your cluster. GPU nodes display:
- GPU count per node
- GPU model name (e.g., "NVIDIA A100-SXM4-80GB")
- GPU utilization percentage
- GPU memory usage
GPU Sharing Limitations
Configuration
GPU metrics collection is enabled by default:
yaml
agent:
config:
gpuMetricsEnabled: true # default
dcgmPort: 9400 # default
dcgmNamespace: "" # auto-detect across all namespacesOverride Namespace
If the agent cannot find DCGM Exporter pods, restrict the search to a specific namespace:
bash
helm upgrade kubeadapt kubeadapt/kubeadapt \
--namespace kubeadapt \
--reuse-values \
--set agent.config.dcgmNamespace="gpu-operator"Disable GPU Metrics
yaml
agent:
config:
gpuMetricsEnabled: falseFor the full list of Helm values, see the kubeadapt-helm chart on GitHub.