Why is container.cpu.usage metric from kubeletstats receiver bigger than k8s.pod.cpu.usage metric from the same receiver

Question

I have deployed opentelemetry-collector-contrib 0.139.0 to a Kubernetes cluster as a DaemonSet. The Kubernetes cluster is deployed on AWS EKS with a Node group of two EC2 Nodes. Kubernetes version is currently 1.34. The DaemonSet YAML of the collector(extracted using kubectl get daemonset -o yaml):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
  creationTimestamp: "2025-11-07T08:19:48Z"
  generation: 1
  labels:
    app: otel-collector
  name: otel-collector
  namespace: observability-namespace
  resourceVersion: "22385414"
  uid: 28c2f297-7404-439c-90d3-eec673c92e0e
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - command:
        - /otelcol-contrib
        - --config
        - /conf/otel-collector-config.yaml
        env:
        - name: K8S_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: otel/opentelemetry-collector-contrib:0.139.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 1
          httpGet:
            path: /
            port: 13133
            scheme: HTTP
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        name: otel-collector
        ports:
        - containerPort: 4318
          protocol: TCP
        - containerPort: 13133
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: otel-collector-config-vol        
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: otel-account
      serviceAccountName: otel-account
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: otel-collector-config
            path: otel-collector-config.yaml
          name: otel-config
        name: otel-collector-config-vol      
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 2
  desiredNumberScheduled: 2
  numberAvailable: 2
  numberMisscheduled: 0
  numberReady: 2
  observedGeneration: 1
  updatedNumberScheduled: 2

The otel collector uses this configuration put inside otel-config ConfigMap:

exporters:
  prometheusremotewrite:
    auth:
      authenticator: sigv4auth/metrics
    endpoint: https://aps-workspaces.eu-central-1.amazonaws.com/workspaces/xxxxxxxxxxxxxxxx/api/v1/remote_write
    resource_to_telemetry_conversion:
      enabled: true
extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  sigv4auth/metrics:
    region: eu-central-1
    service: aps
receivers:      
  kubeletstats:
    auth_type: serviceAccount
    collection_interval: 20s
    endpoint: https://${env:K8S_NODE_NAME}:10250  
service:
  extensions:    
    - sigv4auth/metrics
    - health_check
  pipelines:    
    metrics:
      exporters:
        - prometheusremotewrite      
      receivers:     
        - kubeletstats

I have a Grafana deployment on AWS with Prometheus, also using the AWS managed service. When I query metrics using Grafana for my cluster, I see two metrics: container.cpu.usage and k8s.pod.cpu.usage. On this same Kubernetes cluster, I'm deploying a Python bot with Docker.

I understand the difference between container.cpu.usage and k8s.pod.cpu.usage, as a pod can contain multiple containers and a container name can belong to multiple pods(when new versions are deployed). For this particular Python bot deployment, its pod only has one container.

I did an average query over the last month in Grafana by k8s_container_name and k8s_pod_name labels for both metrics(k8s.pod.cpu.usage only by k8s_pod_name since it doesn't have a k8s_container_name). I see that in the last month, my Python bot had only one pod name for both metrics. Yet container.cpu.usage reported 0.8 average value at one time in the last month, while k8s.pod.cpu.usage metric never reported more than 0.02 for any pod, including the previous Python bot pod, in the whole month.

How can that be? How can container.cpu.usage be way bigger than k8s.pod.cpu.usage? Maybe I'm misunderstanding these two metrics?

Here is the Python bot Kubernetes deployment file(got it through kubectl get deploy -o yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "28"
  creationTimestamp: "2025-08-20T07:58:23Z"
  generation: 28
  labels:
    app: xxxxx
  name: xxxxx
  namespace: xxxx-namespace
  resourceVersion: "22191731"
  uid: d972b210-d072-4a0d-94a0-4bd26cf05566
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: xxxxxxxx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: xxxxxx
    spec:
      containers:
      - image: xxx-imageUri
        imagePullPolicy: IfNotPresent
        name: xxxxxxx
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsGroup: 1000
          runAsNonRoot: true
          runAsUser: 1000
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: xxxx-service-account
      serviceAccountName: xxxx-service-account-name
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2025-11-06T09:04:12Z"
    lastUpdateTime: "2025-11-06T09:04:12Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2025-08-20T07:58:23Z"
    lastUpdateTime: "2025-11-06T15:44:37Z"
    message: ReplicaSet "xxxxxxxxxxx" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 28
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

This is the Dockerfile of the Python bot:

# Dockerfile
# --- Builder stage ---
FROM python:3.14.0-alpine3.22 AS builder

WORKDIR /install

COPY requirements.txt .
RUN pip install --prefix=/install/deps --no-cache-dir -r requirements.txt

# --- Runtime stage ---
FROM python:3.14.0-alpine3.22

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

WORKDIR /app

# Copy only the installed dependencies from builder
COPY --from=builder /install/deps /usr/local

# Copy each python file
# our python bot is small composed of only 3 files
# the actual names is different and not important
COPY file1.py .
COPY file2.py .
COPY file3.py .

CMD ["python", "file1.py"]

Collectives™ on Stack Overflow

Why is container.cpu.usage metric from kubeletstats receiver bigger than k8s.pod.cpu.usage metric from the same receiver

0

Your Reply

Collectives™ on Stack Overflow

Why is container.cpu.usage metric from kubeletstats receiver bigger than k8s.pod.cpu.usage metric from the same receiver

0

Your Reply

Sign up or log in

Post as a guest