Pod Chaos

Pod chaos actions target running pods in your cluster. There are 8 actions covering process termination, resource exhaustion, network-layer faults, and DNS failures.

pod-kill

Deletes one or more pods. Kubernetes will reschedule them immediately (assuming a Deployment or ReplicaSet manages them). This is the simplest resilience test: can your app survive a pod restart?

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: kill-frontend-pods
  namespace: production
spec:
  target:
    kind: Pod
    namespace: production
    labelSelector:
      matchLabels:
        app: frontend
  action:
    type: pod-kill
    parameters:
      gracePeriodSeconds: "0"
  duration: 10s
  rollback:
    enabled: false

Parameters:

Parameter	Type	Required	Default	Description
`gracePeriodSeconds`	string	No	`"0"`	Grace period before SIGKILL

Rollback: None. Kubernetes restarts the pod automatically.

container-kill

Kills a specific container within a pod without deleting the pod itself. The container runtime restarts the container according to the pod's restartPolicy. Useful for testing container-level restart behavior.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: kill-sidecar
  namespace: production
spec:
  target:
    kind: Pod
    namespace: production
    labelSelector:
      matchLabels:
        app: my-app
  action:
    type: container-kill
    parameters:
      containerName: "envoy"
  duration: 10s

Parameters:

Parameter	Type	Required	Default	Description
`containerName`	string	Yes	—	Name of the container to kill

Rollback: None. The container runtime handles restart.

Implementation: Uses the daemon's ExecStressChaos RPC with stressorType: container-kill. The daemon sends SIGKILL to the container process via the container runtime socket.

pod-cpu-stress

Runs CPU stress workers inside the pod's cgroup namespace, consuming CPU cycles for the experiment duration. Tests how your app behaves under CPU contention.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: cpu-stress-api
  namespace: production
spec:
  target:
    kind: Pod
    namespace: production
    labelSelector:
      matchLabels:
        app: api-server
  action:
    type: pod-cpu-stress
    parameters:
      workers: "2"
      load: "80"
      duration: "60s"
  duration: 60s
  rollback:
    enabled: true

Parameters:

Parameter	Type	Required	Default	Description
`workers`	string	No	`"1"`	Number of CPU stress workers
`load`	string	No	`"100"`	CPU load percentage (0-100)
`duration`	string	No	experiment duration	How long to stress

Rollback: Sends a CancelChaos RPC to the daemon, which terminates the stress-ng process.

pod-memory-stress

Allocates memory inside the pod's cgroup, simulating memory pressure. Tests OOM behavior, memory limits, and application degradation under low-memory conditions.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: memory-stress-worker
  namespace: production
spec:
  target:
    kind: Pod
    namespace: production
    labelSelector:
      matchLabels:
        app: worker
  action:
    type: pod-memory-stress
    parameters:
      workers: "1"
      size: "256m"
      duration: "60s"
  duration: 60s
  rollback:
    enabled: true

Parameters:

Parameter	Type	Required	Default	Description
`workers`	string	No	`"1"`	Number of memory stress workers
`size`	string	No	`"256m"`	Memory to allocate per worker (e.g. `256m`, `1g`)
`duration`	string	No	experiment duration	How long to stress

Rollback: Sends CancelChaos to the daemon.

pod-io-stress

Generates disk I/O load inside the pod's cgroup. Tests how your app handles slow storage, I/O saturation, and disk-bound workloads.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: io-stress-db
  namespace: production
spec:
  target:
    kind: Pod
    namespace: production
    labelSelector:
      matchLabels:
        app: postgres
  action:
    type: pod-io-stress
    parameters:
      workers: "4"
      size: "1g"
      duration: "30s"
  duration: 30s
  rollback:
    enabled: true

Parameters:

Parameter	Type	Required	Default	Description
`workers`	string	No	`"1"`	Number of I/O stress workers
`size`	string	No	`"1g"`	Total I/O size per worker
`duration`	string	No	experiment duration	How long to stress

Rollback: Sends CancelChaos to the daemon.

pod-dns-error

Injects DNS resolution failures for specified domains inside the pod's network namespace. Tests how your app handles DNS outages, service discovery failures, and missing DNS entries.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: dns-error-api
  namespace: production
spec:
  target:
    kind: Pod
    namespace: production
    labelSelector:
      matchLabels:
        app: frontend
  action:
    type: pod-dns-error
    parameters:
      domains: "api.internal,db.internal"
      errorType: "NXDOMAIN"
  duration: 30s
  rollback:
    enabled: true

Parameters:

Parameter	Type	Required	Default	Description
`domains`	string	No	`""` (all)	Comma-separated domains to fail
`errorType`	string	No	`"NXDOMAIN"`	DNS error type (`NXDOMAIN`, `SERVFAIL`)

Rollback: Sends CancelChaos to the daemon, which removes the DNS intercept rules.

pod-http-abort

Intercepts outbound HTTP traffic from the pod and returns error status codes. Tests how your app handles downstream HTTP failures.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: http-abort-payments
  namespace: production
spec:
  target:
    kind: Pod
    namespace: production
    labelSelector:
      matchLabels:
        app: checkout
  action:
    type: pod-http-abort
    parameters:
      port: "8080"
      statusCode: "503"
  duration: 30s
  rollback:
    enabled: true

Parameters:

Parameter	Type	Required	Default	Description
`port`	string	Yes	—	Port to intercept
`statusCode`	string	No	`"503"`	HTTP status code to return

Rollback: Sends CancelChaos to the daemon, which removes the iptables/proxy rules.

pod-http-delay

Adds latency to outbound HTTP traffic from the pod. Tests timeout handling, retry logic, and circuit breaker behavior.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
  name: http-delay-api
  namespace: production
spec:
  target:
    kind: Pod
    namespace: production
    labelSelector:
      matchLabels:
        app: api-gateway
  action:
    type: pod-http-delay
    parameters:
      port: "8080"
      delay: "2000ms"
  duration: 60s
  rollback:
    enabled: true

Parameters:

Parameter	Type	Required	Default	Description
`port`	string	Yes	—	Port to intercept
`delay`	string	Yes	—	Delay to add (e.g. `500ms`, `2s`)

Rollback: Sends CancelChaos to the daemon.

Targeting strategies

By label selector

target:
  kind: Pod
  namespace: production
  labelSelector:
    matchLabels:
      app: my-app
      tier: backend

By name

target:
  kind: Pod
  namespace: production
  names:
    - my-pod-abc123
    - my-pod-def456

With parallelism

By default, all matching pods are targeted. Use execution.parallelism to limit concurrent targets:

spec:
  execution:
    parallelism: 1

Using steady-state probes

Always pair pod chaos with steady-state probes to verify recovery:

steadyState:
  before:
    - name: min-replicas
      type: k8s
      k8s:
        resource: pods
        namespace: production
        labelSelector: app=my-app
        condition:
          minReady: 2
  after:
    - name: min-replicas-recovered
      type: k8s
      k8s:
        resource: pods
        namespace: production
        labelSelector: app=my-app
        condition:
          minReady: 2
  recoveryTimeout: 3m

pod-kill​

container-kill​

pod-cpu-stress​

pod-memory-stress​

pod-io-stress​

pod-dns-error​

pod-http-abort​

pod-http-delay​

Targeting strategies​

By label selector​

By name​

With parallelism​

Using steady-state probes​

pod-kill

container-kill

pod-cpu-stress

pod-memory-stress

pod-io-stress

pod-dns-error

pod-http-abort

pod-http-delay

Targeting strategies

By label selector

By name

With parallelism

Using steady-state probes