Skip to main content

Pod Chaos

Pod chaos actions target running pods in your cluster. There are 8 actions covering process termination, resource exhaustion, network-layer faults, and DNS failures.

pod-kill

Deletes one or more pods. Kubernetes will reschedule them immediately (assuming a Deployment or ReplicaSet manages them). This is the simplest resilience test: can your app survive a pod restart?

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: kill-frontend-pods
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: frontend
action:
type: pod-kill
parameters:
gracePeriodSeconds: "0"
duration: 10s
rollback:
enabled: false

Parameters:

ParameterTypeRequiredDefaultDescription
gracePeriodSecondsstringNo"0"Grace period before SIGKILL

Rollback: None. Kubernetes restarts the pod automatically.

container-kill

Kills a specific container within a pod without deleting the pod itself. The container runtime restarts the container according to the pod's restartPolicy. Useful for testing container-level restart behavior.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: kill-sidecar
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: my-app
action:
type: container-kill
parameters:
containerName: "envoy"
duration: 10s

Parameters:

ParameterTypeRequiredDefaultDescription
containerNamestringYesName of the container to kill

Rollback: None. The container runtime handles restart.

Implementation: Uses the daemon's ExecStressChaos RPC with stressorType: container-kill. The daemon sends SIGKILL to the container process via the container runtime socket.

pod-cpu-stress

Runs CPU stress workers inside the pod's cgroup namespace, consuming CPU cycles for the experiment duration. Tests how your app behaves under CPU contention.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: cpu-stress-api
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: api-server
action:
type: pod-cpu-stress
parameters:
workers: "2"
load: "80"
duration: "60s"
duration: 60s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
workersstringNo"1"Number of CPU stress workers
loadstringNo"100"CPU load percentage (0-100)
durationstringNoexperiment durationHow long to stress

Rollback: Sends a CancelChaos RPC to the daemon, which terminates the stress-ng process.

pod-memory-stress

Allocates memory inside the pod's cgroup, simulating memory pressure. Tests OOM behavior, memory limits, and application degradation under low-memory conditions.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: memory-stress-worker
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: worker
action:
type: pod-memory-stress
parameters:
workers: "1"
size: "256m"
duration: "60s"
duration: 60s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
workersstringNo"1"Number of memory stress workers
sizestringNo"256m"Memory to allocate per worker (e.g. 256m, 1g)
durationstringNoexperiment durationHow long to stress

Rollback: Sends CancelChaos to the daemon.

pod-io-stress

Generates disk I/O load inside the pod's cgroup. Tests how your app handles slow storage, I/O saturation, and disk-bound workloads.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: io-stress-db
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: postgres
action:
type: pod-io-stress
parameters:
workers: "4"
size: "1g"
duration: "30s"
duration: 30s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
workersstringNo"1"Number of I/O stress workers
sizestringNo"1g"Total I/O size per worker
durationstringNoexperiment durationHow long to stress

Rollback: Sends CancelChaos to the daemon.

pod-dns-error

Injects DNS resolution failures for specified domains inside the pod's network namespace. Tests how your app handles DNS outages, service discovery failures, and missing DNS entries.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: dns-error-api
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: frontend
action:
type: pod-dns-error
parameters:
domains: "api.internal,db.internal"
errorType: "NXDOMAIN"
duration: 30s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
domainsstringNo"" (all)Comma-separated domains to fail
errorTypestringNo"NXDOMAIN"DNS error type (NXDOMAIN, SERVFAIL)

Rollback: Sends CancelChaos to the daemon, which removes the DNS intercept rules.

pod-http-abort

Intercepts outbound HTTP traffic from the pod and returns error status codes. Tests how your app handles downstream HTTP failures.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: http-abort-payments
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: checkout
action:
type: pod-http-abort
parameters:
port: "8080"
statusCode: "503"
duration: 30s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
portstringYesPort to intercept
statusCodestringNo"503"HTTP status code to return

Rollback: Sends CancelChaos to the daemon, which removes the iptables/proxy rules.

pod-http-delay

Adds latency to outbound HTTP traffic from the pod. Tests timeout handling, retry logic, and circuit breaker behavior.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: http-delay-api
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: api-gateway
action:
type: pod-http-delay
parameters:
port: "8080"
delay: "2000ms"
duration: 60s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
portstringYesPort to intercept
delaystringYesDelay to add (e.g. 500ms, 2s)

Rollback: Sends CancelChaos to the daemon.

Targeting strategies

By label selector

target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: my-app
tier: backend

By name

target:
kind: Pod
namespace: production
names:
- my-pod-abc123
- my-pod-def456

With parallelism

By default, all matching pods are targeted. Use execution.parallelism to limit concurrent targets:

spec:
execution:
parallelism: 1

Using steady-state probes

Always pair pod chaos with steady-state probes to verify recovery:

steadyState:
before:
- name: min-replicas
type: k8s
k8s:
resource: pods
namespace: production
labelSelector: app=my-app
condition:
minReady: 2
after:
- name: min-replicas-recovered
type: k8s
k8s:
resource: pods
namespace: production
labelSelector: app=my-app
condition:
minReady: 2
recoveryTimeout: 3m