Introducing kubeadapt-agent: Lightweight Kubernetes Metrics Collector, Rebuilt from Scratch

Why we rebuilt the agent

The original Kubeadapt agent required Prometheus, OpenCost, node-exporter, and kube-state-metrics before it could collect a single metric. Every cluster needed a full monitoring stack installed up front. Setup was slow, RBAC was complex, and upgrades broke things.

kubeadapt-agent is a ground-up rewrite with a single dependency: metrics-server.

What changed

No more Prometheus, OpenCost, node-exporter, kube-state-metrics

All four dependencies are gone. kubeadapt-agent reads resource usage directly from the Kubernetes Metrics API. The only prerequisite is metrics-server.

GPU metrics, automatically

kubeadapt-agent detects NVIDIA GPUs at startup. If dcgm-exporter is running anywhere in your cluster, the agent finds it and starts collecting GPU utilization, memory, temperature, and power draw per device. See the GPU monitoring guide for details.

GPU Sharing Limitations

GPU time-slicing, MPS, and MIG configurations have limited per-workload attribution due to DCGM Exporter constraints. GPU right-sizing works at the node level only in shared GPU setups. See the GPU monitoring guide for specifics.

Real-time collection

The old agent re-fetched every resource on a fixed interval, pulling the same data over and over. kubeadapt-agent uses Kubernetes watch connections: it syncs once at startup, then receives only change events in real time. When a pod scales or a node joins, the agent knows instantly.

It tracks 20+ resource types by default: nodes, pods, deployments, statefulsets, daemonsets, jobs, HPAs, VPAs, persistent volumes, and more.

Auto-discovery

On startup the agent probes the cluster and adapts to what's available:

Cloud provider (AWS, GCP, Azure)
VPA and Karpenter if installed
GPU nodes and NVIDIA metrics exporters
metrics-server availability

Pre-computed enrichment

Before each snapshot is sent, the agent enriches raw metrics automatically:

Workload mapping: Every pod is traced back to its parent deployment, statefulset, or daemonset. No orphaned metrics.
Resource totals: Cluster-wide CPU, memory, GPU, and storage usage are pre-calculated and ready for the dashboard.

Before and after

	Old agent	kubeadapt-agent
External dependencies	4 (Prometheus, OpenCost, node-exporter, kube-state-metrics)	1 (metrics-server)
GPU support	Manual config	Auto-discovered
Data collection	Periodic full-list API calls	Informer-based (real-time)
Cloud/VPA/Karpenter	Manual or N/A	Auto-detected
Health reporting	Basic	Full diagnostics per snapshot

Migration

Breaking Change

This is a 2.0 release on the same Helm chart. The old agent binary is fully replaced. Run helm upgrade to migrate. The old agent's Prometheus/OpenCost dependencies can be removed after upgrade if nothing else uses them.

Follow the quick-start guide for installation steps.