I have deployed opentelemetry-collector-contrib 0.139.0 to a Kubernetes cluster as a DaemonSet. The Kubernetes cluster is deployed on AWS EKS with a Node group of two EC2 Nodes. Kubernetes version is currently 1.34. The DaemonSet YAML of the collector(extracted using kubectl get daemonset -o yaml):
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2025-11-07T08:19:48Z"
generation: 1
labels:
app: otel-collector
name: otel-collector
namespace: observability-namespace
resourceVersion: "22385414"
uid: 28c2f297-7404-439c-90d3-eec673c92e0e
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- command:
- /otelcol-contrib
- --config
- /conf/otel-collector-config.yaml
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: otel/opentelemetry-collector-contrib:0.139.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 1
httpGet:
path: /
port: 13133
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
name: otel-collector
ports:
- containerPort: 4318
protocol: TCP
- containerPort: 13133
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /conf
name: otel-collector-config-vol
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: otel-account
serviceAccountName: otel-account
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
items:
- key: otel-collector-config
path: otel-collector-config.yaml
name: otel-config
name: otel-collector-config-vol
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 2
desiredNumberScheduled: 2
numberAvailable: 2
numberMisscheduled: 0
numberReady: 2
observedGeneration: 1
updatedNumberScheduled: 2
The otel collector uses this configuration put inside otel-config ConfigMap:
exporters:
prometheusremotewrite:
auth:
authenticator: sigv4auth/metrics
endpoint: https://aps-workspaces.eu-central-1.amazonaws.com/workspaces/xxxxxxxxxxxxxxxx/api/v1/remote_write
resource_to_telemetry_conversion:
enabled: true
extensions:
health_check:
endpoint: 0.0.0.0:13133
sigv4auth/metrics:
region: eu-central-1
service: aps
receivers:
kubeletstats:
auth_type: serviceAccount
collection_interval: 20s
endpoint: https://${env:K8S_NODE_NAME}:10250
service:
extensions:
- sigv4auth/metrics
- health_check
pipelines:
metrics:
exporters:
- prometheusremotewrite
receivers:
- kubeletstats
I have a Grafana deployment on AWS with Prometheus, also using the AWS managed service. When I query metrics using Grafana for my cluster, I see two metrics: container.cpu.usage and k8s.pod.cpu.usage. On this same Kubernetes cluster, I'm deploying a Python bot with Docker.
I understand the difference between container.cpu.usage and k8s.pod.cpu.usage, as a pod can contain multiple containers and a container name can belong to multiple pods(when new versions are deployed). For this particular Python bot deployment, its pod only has one container.
I did an average query over the last month in Grafana by k8s_container_name and k8s_pod_name labels for both metrics(k8s.pod.cpu.usage only by k8s_pod_name since it doesn't have a k8s_container_name). I see that in the last month, my Python bot had only one pod name for both metrics.
Yet container.cpu.usage reported 0.8 average value at one time in the last month, while k8s.pod.cpu.usage metric never reported more than 0.02 for any pod, including the previous Python bot pod, in the whole month.
How can that be? How can container.cpu.usage be way bigger than k8s.pod.cpu.usage? Maybe I'm misunderstanding these two metrics?
Here is the Python bot Kubernetes deployment file(got it through kubectl get deploy -o yaml)
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "28"
creationTimestamp: "2025-08-20T07:58:23Z"
generation: 28
labels:
app: xxxxx
name: xxxxx
namespace: xxxx-namespace
resourceVersion: "22191731"
uid: d972b210-d072-4a0d-94a0-4bd26cf05566
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: xxxxxxxx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: xxxxxx
spec:
containers:
- image: xxx-imageUri
imagePullPolicy: IfNotPresent
name: xxxxxxx
resources: {}
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: xxxx-service-account
serviceAccountName: xxxx-service-account-name
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2025-11-06T09:04:12Z"
lastUpdateTime: "2025-11-06T09:04:12Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2025-08-20T07:58:23Z"
lastUpdateTime: "2025-11-06T15:44:37Z"
message: ReplicaSet "xxxxxxxxxxx" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 28
readyReplicas: 1
replicas: 1
updatedReplicas: 1
This is the Dockerfile of the Python bot:
# Dockerfile
# --- Builder stage ---
FROM python:3.14.0-alpine3.22 AS builder
WORKDIR /install
COPY requirements.txt .
RUN pip install --prefix=/install/deps --no-cache-dir -r requirements.txt
# --- Runtime stage ---
FROM python:3.14.0-alpine3.22
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
WORKDIR /app
# Copy only the installed dependencies from builder
COPY --from=builder /install/deps /usr/local
# Copy each python file
# our python bot is small composed of only 3 files
# the actual names is different and not important
COPY file1.py .
COPY file2.py .
COPY file3.py .
CMD ["python", "file1.py"]