diff --git a/content/kubermatic/main/tutorials-howtos/hpa-with-custom-gpu-metrics/_index.en.md b/content/kubermatic/main/tutorials-howtos/hpa-with-custom-gpu-metrics/_index.en.md new file mode 100644 index 000000000..f1af78bf1 --- /dev/null +++ b/content/kubermatic/main/tutorials-howtos/hpa-with-custom-gpu-metrics/_index.en.md @@ -0,0 +1,303 @@ ++++ +title = "HPA with Custom GPU Metrics" +date = 2025-10-29T00:00:00+00:00 +weight = 20 ++++ + +## Overview + +The Kubernetes Horizontal Pod Autoscaler (HPA) is a fundamental component of Kubernetes that automatically adjusts the number +of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed resource utilization or other custom metrics. + +----- + +## HPA Installation Guide + +The Horizontal Pod Autoscaler is a built-in feature of Kubernetes, so there is no separate "installation" required for +the controller itself. However, it relies on the **Metrics Server** to function correctly. + +### Step 1: Install the Metrics Server (Prerequisite) + +The Metrics Server is a crucial component that collects resource usage data (CPU, memory) from all nodes and pods, which +the HPA then uses to make scaling decisions. + +**Note:** You can install the Metrics Server and the whole MLA stack in KKP by enabling User Cluster Monitoring checkbox +in the Cluster settings. More information can be found [here](https://docs.kubermatic.com/kubermatic/v2.29/tutorials-howtos/monitoring-logging-alerting/user-cluster/user-guide/). + + +Once running, you can test it by checking if you can retrieve node and pod metrics: + +```bash +kubectl top nodes +kubectl top pods +``` + +----- + +### Step 2: Configure Resource Requests + +The HPA scales based on a percentage of the defined **resource requests**. If your Deployment does not have CPU requests defined, the HPA will not be able to function based on CPU utilization. + +Ensure your workload's YAML file (Deployment, ReplicaSet, etc.) includes a `resources: requests` block: + +```yaml +# Snippet from your Deployment YAML +spec: + template: + spec: + containers: + - name: my-container + image: k8s.gcr.io/hpa-example # A simple example image + resources: + requests: + cpu: "200m" # 200 milliCPU (0.2 CPU core) + limits: + cpu: "500m" # Optional, but recommended +``` + +----- + +### Step 3: Deploy the Horizontal Pod Autoscaler (HPA) + +You can deploy the HPA using either a simple command or a declarative YAML file. + +#### Option A: Using the `kubectl autoscale` Command (Quick Method) + +This is the fastest way to create an HPA resource: + +```bash +kubectl autoscale deployment [DEPLOYMENT_NAME] \ + --cpu-percent=50 \ + --min=2 \ + --max=10 +``` + +* `[DEPLOYMENT_NAME]`: Replace this with the actual name of your Deployment. +* `--cpu-percent=50`: The HPA will try to maintain an average CPU utilization of **50%** across all pods. +* `--min=2`: The minimum number of replicas. +* `--max=10`: The maximum number of replicas. + +----- + +#### Option B: Using a Declarative YAML Manifest (Recommended Method) + +For complex configurations (like scaling on memory or custom metrics), a YAML manifest is better. We recommend using the **`autoscaling/v2`** API version for the latest features. + +**`hpa-config.yaml`** + +```yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: my-app-hpa +spec: + scaleTargetRef: + # Target the resource that needs to be scaled + apiVersion: apps/v1 + kind: Deployment + name: hpa-demo-deployment # <-- REPLACE with your Deployment name + + minReplicas: 2 + maxReplicas: 10 + + metrics: + # Metric 1: Scale based on CPU utilization + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 50 # Target average 50% CPU utilization + + # Metric 2: Scale based on Memory utilization (optional) + - type: Resource + resource: + name: memory + target: + type: AverageValue + averageValue: 300Mi # Target average of 300 MiB of memory usage +``` + +**Apply the HPA:** + +```bash +kubectl apply -f hpa-config.yaml +``` + +----- + +### Step 4: Verify the HPA Status + +Check that the HPA has been created and is monitoring your application: + +```bash +kubectl get hpa + +# Example Output: +# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE +# my-app-hpa Deployment/hpa-demo-deployment 0%/50% 2 10 2 2m +``` + +The **`TARGETS`** column shows the current utilization versus the target. If it shows **``** or **`missing resource metric`**, double-check that your **Metrics Server** is healthy and your Deployment has **resource requests** defined. + +For details on scaling decisions, check the events: + +```bash +kubectl describe hpa my-app-hpa +``` + +This command will show the **Conditions** and **Events** sections, which explain when the HPA scaled up or down and why. + +## Setting Up HPA with DCGM Metrics + +Autoscaling GPU-accelerated workloads in Kubernetes involves dynamically adjusting the number of Pods based on real-time +utilization of the GPU resources. This process is more complex than scaling based on standard CPU or memory, as it +requires setting up a dedicated Custom Metrics Pipeline to feed GPU-specific telemetry to the Horizontal Pod Autoscaler (HPA). + +Here is the rephrased paragraph in a clear Markdown format, emphasizing the key components and their roles in GPU-based autoscaling: + +--- + +### Scaling AI/ML Workloads with GPU Metrics + +To enable autoscaling for AI/ML workloads based on GPU performance, you must establish a reliable source for those specialized metrics. +In this document, we will use a custom GPU metrics pipeline that leverages the NVIDIA GPU Device Plugin and DCGM (Data Center GPU Manager) to collect GPU-specific performance metrics. + +| Component | Role in the Pipeline | +|:------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **NVIDIA GPU Operator** | The GPU Operator is an umbrella package that automates the deployment of all necessary NVIDIA components for Kubernetes. This stack includes the NVIDIA DCGM Exporter (Data Center GPU Manager Exporter). | +| **Prometheus Server** | Prometheus monitors applications running in the user clusters as well as system components running in the user clusters. | +| **Prometheus Adapter** | The Prometheus Adapter is a crucial component in Kubernetes that allows the Horizontal Pod Autoscaler (HPA) to scale workloads using custom metrics collected by Prometheus. | + +--- + +### Install NVIDIA GPU Operator + +KKP offers the possibility to install the NVIDIA GPU Operator in the user cluster, by using our application catalog for enterprise customers, +or by installing it manually in the user cluster. to install the operator via our application catalog, follow the instructions +[here](https://docs.kubermatic.com/kubermatic/v2.29/architecture/concept/kkp-concepts/applications/default-applications-catalog/nvidia-gpu-operator/). + +To install the operator manually, follow the instructions [here](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html). + +### Install Prometheus +We need Prometheus for the Prometheus Adapter because the Adapter relies on Prometheus as its source of metrics data. +The Adapter itself does not collect metrics; its sole purpose is to translate and expose the metrics that Prometheus has already collected + +The adapter should be installed where the prometheus server is running as the adapter will be configured to query the prometheus server. This +can be achieved by installing the adapter in the seed cluster where the user cluster prmetheus server is running. + +Another approach can be to run a prometheus server in the user cluster directly via Kubermatic custom app definition or +manually running it on the cluster via helm: + +```console +# Add the Prometheus Community Helm repository +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts + +# Update your local Helm chart repository cache +helm repo update + +helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \ + --create-namespace \ + --namespace monitoring \ + --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \ + --set alertmanager.enabled=false # Optional: Disable Alertmanager if you don't need alerts immediately +``` + +### Install Prometheus Adapter +The Prometheus Adapter is a crucial component in Kubernetes that allows the Horizontal Pod Autoscaler (HPA) to scale +workloads using custom metrics collected by Prometheus. + +Users can install the Prometheus Adapter in the user cluster by via helm by executing these commands: + +For Helm2 +```console +$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +$ helm repo update +$ helm install --name my-release prometheus-community/prometheus-adapter +``` +For Helm3 ( as name is mandatory ) +```console +$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +$ helm repo update +$ helm install my-release prometheus-community/prometheus-adapter +``` + +For more information on how to install the Prometheus Adapter, please refer to the [official documentation](https://github.com/kubernetes-sigs/prometheus-adapter). + +### Setting HPA with DCGM Metrics + +Here is an example of a HPA configuration that scales based on GPU utilization. Creating a Kubernetes Deployment that +utilizes an NVIDIA GPU requires two main things: ensuring your cluster has the NVIDIA Device Plugin running (a prerequisite) +and specifying the GPU resource in the Pod's manifest. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: gpu-worker-deployment + labels: + app: gpu-worker +spec: + replicas: 1 # Start with 1 replica, HPA will scale this up + selector: + matchLabels: + app: gpu-worker + template: + metadata: + labels: + app: gpu-worker + spec: + # Node Selector (Optional but recommended) + # This ensures the Pod is only scheduled on nodes labeled to have GPUs. + nodeSelector: + accelerator: nvidia + + containers: + - name: cuda-container + image: nvcr.io/nvidia/cuda:12.4.1-runtime-ubuntu22.04 # Use a robust NVIDIA image + command: ["/bin/bash", "-c"] + args: ["/usr/local/nvidia/bin/nvidia-smi; sleep infinity"] # Example command to keep the container running + + # --- GPU Resource Configuration (CRITICAL) --- + resources: + limits: + # This is the line that requests a GPU resource from the cluster. + # Replace '1' with the number of GPUs required (e.g., "0.5" if using MIG or time-slicing) + nvidia.com/gpu: "1" + + # Requests should be identical to limits for non-sharable resources like GPUs + requests: + nvidia.com/gpu: "1" +``` + +Next we will configure the HPA to scale based on the GPU utilization: +```yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: gpu-util-autoscaler + namespace: default # Ensure this matches your deployment's namespace +spec: + # 1. Target the Deployment created previously + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: gpu-worker-deployment + + # 2. Define scaling limits + minReplicas: 1 + maxReplicas: 5 # Define the maximum number of GPU workers + + # 3. Define the custom metric + metrics: + - type: Pod # Metric applies to the pods managed by the Deployment + pod: + metric: + # This name MUST match the metric alias defined in your + # Prometheus Adapter configuration (ConfigMap) + name: dcgm_gpu_utilization_percent + target: + type: AverageValue + # Scale up if the average GPU utilization across all pods exceeds 60% + averageValue: 60 +``` diff --git a/content/kubermatic/v2.29/tutorials-howtos/hpa-with-custom-gpu-metrics/_index.en.md b/content/kubermatic/v2.29/tutorials-howtos/hpa-with-custom-gpu-metrics/_index.en.md new file mode 100644 index 000000000..c62c11b3c --- /dev/null +++ b/content/kubermatic/v2.29/tutorials-howtos/hpa-with-custom-gpu-metrics/_index.en.md @@ -0,0 +1,303 @@ ++++ +title = "HPA with Custom GPU Metrics" +date = 2025-10-29T00:00:00+00:00 +weight = 20 ++++ + +## Overview + +The Kubernetes Horizontal Pod Autoscaler (HPA) is a fundamental component of Kubernetes that automatically adjusts the number +of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed resource utilization or other custom metrics. + +----- + +## HPA Installation Guide + +The Horizontal Pod Autoscaler is a built-in feature of Kubernetes, so there is no separate "installation" required for +the controller itself. However, it relies on the **Metrics Server** to function correctly. + +### Step 1: Install the Metrics Server (Prerequisite) + +The Metrics Server is a crucial component that collects resource usage data (CPU, memory) from all nodes and pods, which +the HPA then uses to make scaling decisions. + +**Note:** You can install the Metrics Server and the whole MLA stack in KKP by enabling User Cluster Monitoring checkbox +in the Cluster settings. More information can be found [here](https://docs.kubermatic.com/kubermatic/v2.29/tutorials-howtos/monitoring-logging-alerting/user-cluster/user-guide/). + + +Once running, you can test it by checking if you can retrieve node and pod metrics: + +```bash +kubectl top nodes +kubectl top pods +``` + +----- + +### Step 2: Configure Resource Requests + +The HPA scales based on a percentage of the defined **resource requests**. If your Deployment does not have CPU requests defined, the HPA will not be able to function based on CPU utilization. + +Ensure your workload's YAML file (Deployment, ReplicaSet, etc.) includes a `resources: requests` block: + +```yaml +# Snippet from your Deployment YAML +spec: + template: + spec: + containers: + - name: my-container + image: k8s.gcr.io/hpa-example # A simple example image + resources: + requests: + cpu: "200m" # 200 milliCPU (0.2 CPU core) + limits: + cpu: "500m" # Optional, but recommended +``` + +----- + +### Step 3: Deploy the Horizontal Pod Autoscaler (HPA) + +You can deploy the HPA using either a simple command or a declarative YAML file. + +#### Option A: Using the `kubectl autoscale` Command (Quick Method) + +This is the fastest way to create an HPA resource: + +```bash +kubectl autoscale deployment [DEPLOYMENT_NAME] \ + --cpu-percent=50 \ + --min=2 \ + --max=10 +``` + +* `[DEPLOYMENT_NAME]`: Replace this with the actual name of your Deployment. +* `--cpu-percent=50`: The HPA will try to maintain an average CPU utilization of **50%** across all pods. +* `--min=2`: The minimum number of replicas. +* `--max=10`: The maximum number of replicas. + +----- + +#### Option B: Using a Declarative YAML Manifest (Recommended Method) + +For complex configurations (like scaling on memory or custom metrics), a YAML manifest is better. We recommend using the **`autoscaling/v2`** API version for the latest features. + +**`hpa-config.yaml`** + +```yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: my-app-hpa +spec: + scaleTargetRef: + # Target the resource that needs to be scaled + apiVersion: apps/v1 + kind: Deployment + name: hpa-demo-deployment # <-- REPLACE with your Deployment name + + minReplicas: 2 + maxReplicas: 10 + + metrics: + # Metric 1: Scale based on CPU utilization + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 50 # Target average 50% CPU utilization + + # Metric 2: Scale based on Memory utilization (optional) + - type: Resource + resource: + name: memory + target: + type: AverageValue + averageValue: 300Mi # Target average of 300 MiB of memory usage +``` + +**Apply the HPA:** + +```bash +kubectl apply -f hpa-config.yaml +``` + +----- + +### Step 4: Verify the HPA Status + +Check that the HPA has been created and is monitoring your application: + +```bash +kubectl get hpa + +# Example Output: +# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE +# my-app-hpa Deployment/hpa-demo-deployment 0%/50% 2 10 2 2m +``` + +The **`TARGETS`** column shows the current utilization versus the target. If it shows **``** or **`missing resource metric`**, double-check that your **Metrics Server** is healthy and your Deployment has **resource requests** defined. + +For details on scaling decisions, check the events: + +```bash +kubectl describe hpa my-app-hpa +``` + +This command will show the **Conditions** and **Events** sections, which explain when the HPA scaled up or down and why. + +## Setting Up HPA with DCGM Metrics + +Autoscaling GPU-accelerated workloads in Kubernetes involves dynamically adjusting the number of Pods based on real-time +utilization of the GPU resources. This process is more complex than scaling based on standard CPU or memory, as it +requires setting up a dedicated Custom Metrics Pipeline to feed GPU-specific telemetry to the Horizontal Pod Autoscaler (HPA). + +Here is the rephrased paragraph in a clear Markdown format, emphasizing the key components and their roles in GPU-based autoscaling: + +--- + +### Scaling AI/ML Workloads with GPU Metrics + +To enable autoscaling for AI/ML workloads based on GPU performance, you must establish a reliable source for those specialized metrics. +In this document, we will use a custom GPU metrics pipeline that leverages the NVIDIA GPU Device Plugin and DCGM (Data Center GPU Manager) to collect GPU-specific performance metrics. + +| Component | Role in the Pipeline | +|:------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **NVIDIA GPU Operator** | The GPU Operator is an umbrella package that automates the deployment of all necessary NVIDIA components for Kubernetes. This stack includes the NVIDIA DCGM Exporter (Data Center GPU Manager Exporter). | +| **Prometheus Server** | Prometheus monitors applications running in the user clusters as well as system components running in the user clusters. | +| **Prometheus Adapter** | The Prometheus Adapter is a crucial component in Kubernetes that allows the Horizontal Pod Autoscaler (HPA) to scale workloads using custom metrics collected by Prometheus. | + +--- + +### Install NVIDIA GPU Operator + +KKP offers the possibility to install the NVIDIA GPU Operator in the user cluster, by using our application catalog for enterprise customers, +or by installing it manually in the user cluster. to install the operator via our application catalog, follow the instructions +[here](https://docs.kubermatic.com/kubermatic/v2.29/architecture/concept/kkp-concepts/applications/default-applications-catalog/nvidia-gpu-operator/). + +To install the operator manually, follow the instructions [here](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html). + +### Install Prometheus +We need Prometheus for the Prometheus Adapter because the Adapter relies on Prometheus as its source of metrics data. +The Adapter itself does not collect metrics; its sole purpose is to translate and expose the metrics that Prometheus has already collected + +The adapter should be installed where the prometheus server is running as the adapter will be configured to query the prometheus server. This +can be achieved by installing the adapter in the seed cluster where the user cluster prmetheus server is running. + +Another approach can be to run a prometheus server in the user cluster directly via Kubermatic custom app definition or +manually running it on the cluster via helm: + +```console +# Add the Prometheus Community Helm repository +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts + +# Update your local Helm chart repository cache +helm repo update + +helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \ + --create-namespace \ + --namespace monitoring \ + --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \ + --set alertmanager.enabled=false # Optional: Disable Alertmanager if you don't need alerts immediately +``` + +### Install Prometheus Adapter +The Prometheus Adapter is a crucial component in Kubernetes that allows the Horizontal Pod Autoscaler (HPA) to scale +workloads using custom metrics collected by Prometheus. + +Users can install the Prometheus Adapter in the user cluster by via helm by executing these commands: + +For Helm2 +```console +$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +$ helm repo update +$ helm install --name my-release prometheus-community/prometheus-adapter +``` +For Helm3 ( as name is mandatory ) +```console +$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +$ helm repo update +$ helm install my-release prometheus-community/prometheus-adapter +``` + +For more information on how to install the Prometheus Adapter, please refer to the [official documentation](https://github.com/kubernetes-sigs/prometheus-adapter). + +### Setting HPA with DCGM Metrics + +Here is an example of a HPA configuration that scales based on GPU utilization. Creating a Kubernetes Deployment that +utilizes an NVIDIA GPU requires two main things: ensuring your cluster has the NVIDIA Device Plugin running (a prerequisite) +and specifying the GPU resource in the Pod's manifest. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: gpu-worker-deployment + labels: + app: gpu-worker +spec: + replicas: 1 # Start with 1 replica, HPA will scale this up + selector: + matchLabels: + app: gpu-worker + template: + metadata: + labels: + app: gpu-worker + spec: + # Node Selector (Optional but recommended) + # This ensures the Pod is only scheduled on nodes labeled to have GPUs. + nodeSelector: + accelerator: nvidia + + containers: + - name: cuda-container + image: nvcr.io/nvidia/cuda:12.4.1-runtime-ubuntu22.04 # Use a robust NVIDIA image + command: ["/bin/bash", "-c"] + args: ["/usr/local/nvidia/bin/nvidia-smi; sleep infinity"] # Example command to keep the container running + + # --- GPU Resource Configuration (CRITICAL) --- + resources: + limits: + # This is the line that requests a GPU resource from the cluster. + # Replace '1' with the number of GPUs required (e.g., "0.5" if using MIG or time-slicing) + nvidia.com/gpu: "1" + + # Requests should be identical to limits for non-sharable resources like GPUs + requests: + nvidia.com/gpu: "1" +``` + +Next we will configure the HPA to scale based on the GPU utilization: +```yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: gpu-util-autoscaler + namespace: default # Ensure this matches your deployment's namespace +spec: + # 1. Target the Deployment created previously + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: gpu-worker-deployment + + # 2. Define scaling limits + minReplicas: 1 + maxReplicas: 5 # Define the maximum number of GPU workers + + # 3. Define the custom metric + metrics: + - type: Pod # Metric applies to the pods managed by the Deployment + pod: + metric: + # This name MUST match the metric alias defined in your + # Prometheus Adapter configuration (ConfigMap) + name: dcgm_gpu_utilization_percent + target: + type: AverageValue + # Scale up if the average GPU utilization across all pods exceeds 60% + averageValue: 60 +```