Pod Chaos
Pod chaos actions target running pods in your cluster. There are 8 actions covering process termination, resource exhaustion, network-layer faults, and DNS failures.
pod-kill
Deletes one or more pods. Kubernetes will reschedule them immediately (assuming a Deployment or ReplicaSet manages them). This is the simplest resilience test: can your app survive a pod restart?
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: kill-frontend-pods
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: frontend
action:
type: pod-kill
parameters:
gracePeriodSeconds: "0"
duration: 10s
rollback:
enabled: false
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
gracePeriodSeconds | string | No | "0" | Grace period before SIGKILL |
Rollback: None. Kubernetes restarts the pod automatically.
container-kill
Kills a specific container within a pod without deleting the pod itself. The container runtime restarts the container according to the pod's restartPolicy. Useful for testing container-level restart behavior.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: kill-sidecar
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: my-app
action:
type: container-kill
parameters:
containerName: "envoy"
duration: 10s
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
containerName | string | Yes | — | Name of the container to kill |
Rollback: None. The container runtime handles restart.
Implementation: Uses the daemon's ExecStressChaos RPC with stressorType: container-kill. The daemon sends SIGKILL to the container process via the container runtime socket.
pod-cpu-stress
Runs CPU stress workers inside the pod's cgroup namespace, consuming CPU cycles for the experiment duration. Tests how your app behaves under CPU contention.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: cpu-stress-api
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: api-server
action:
type: pod-cpu-stress
parameters:
workers: "2"
load: "80"
duration: "60s"
duration: 60s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
workers | string | No | "1" | Number of CPU stress workers |
load | string | No | "100" | CPU load percentage (0-100) |
duration | string | No | experiment duration | How long to stress |
Rollback: Sends a CancelChaos RPC to the daemon, which terminates the stress-ng process.
pod-memory-stress
Allocates memory inside the pod's cgroup, simulating memory pressure. Tests OOM behavior, memory limits, and application degradation under low-memory conditions.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: memory-stress-worker
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: worker
action:
type: pod-memory-stress
parameters:
workers: "1"
size: "256m"
duration: "60s"
duration: 60s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
workers | string | No | "1" | Number of memory stress workers |
size | string | No | "256m" | Memory to allocate per worker (e.g. 256m, 1g) |
duration | string | No | experiment duration | How long to stress |
Rollback: Sends CancelChaos to the daemon.
pod-io-stress
Generates disk I/O load inside the pod's cgroup. Tests how your app handles slow storage, I/O saturation, and disk-bound workloads.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: io-stress-db
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: postgres
action:
type: pod-io-stress
parameters:
workers: "4"
size: "1g"
duration: "30s"
duration: 30s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
workers | string | No | "1" | Number of I/O stress workers |
size | string | No | "1g" | Total I/O size per worker |
duration | string | No | experiment duration | How long to stress |
Rollback: Sends CancelChaos to the daemon.
pod-dns-error
Injects DNS resolution failures for specified domains inside the pod's network namespace. Tests how your app handles DNS outages, service discovery failures, and missing DNS entries.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: dns-error-api
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: frontend
action:
type: pod-dns-error
parameters:
domains: "api.internal,db.internal"
errorType: "NXDOMAIN"
duration: 30s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
domains | string | No | "" (all) | Comma-separated domains to fail |
errorType | string | No | "NXDOMAIN" | DNS error type (NXDOMAIN, SERVFAIL) |
Rollback: Sends CancelChaos to the daemon, which removes the DNS intercept rules.
pod-http-abort
Intercepts outbound HTTP traffic from the pod and returns error status codes. Tests how your app handles downstream HTTP failures.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: http-abort-payments
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: checkout
action:
type: pod-http-abort
parameters:
port: "8080"
statusCode: "503"
duration: 30s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
port | string | Yes | — | Port to intercept |
statusCode | string | No | "503" | HTTP status code to return |
Rollback: Sends CancelChaos to the daemon, which removes the iptables/proxy rules.
pod-http-delay
Adds latency to outbound HTTP traffic from the pod. Tests timeout handling, retry logic, and circuit breaker behavior.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: http-delay-api
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: api-gateway
action:
type: pod-http-delay
parameters:
port: "8080"
delay: "2000ms"
duration: 60s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
port | string | Yes | — | Port to intercept |
delay | string | Yes | — | Delay to add (e.g. 500ms, 2s) |
Rollback: Sends CancelChaos to the daemon.
Targeting strategies
By label selector
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: my-app
tier: backend
By name
target:
kind: Pod
namespace: production
names:
- my-pod-abc123
- my-pod-def456
With parallelism
By default, all matching pods are targeted. Use execution.parallelism to limit concurrent targets:
spec:
execution:
parallelism: 1
Using steady-state probes
Always pair pod chaos with steady-state probes to verify recovery:
steadyState:
before:
- name: min-replicas
type: k8s
k8s:
resource: pods
namespace: production
labelSelector: app=my-app
condition:
minReady: 2
after:
- name: min-replicas-recovered
type: k8s
k8s:
resource: pods
namespace: production
labelSelector: app=my-app
condition:
minReady: 2
recoveryTimeout: 3m