Skip to main content

ChaosExperiment

ChaosExperiment is the core resource. It defines a single chaos injection: what to target, what action to run, how long to run it, and how to verify the system's health before and after.

Group: chaos.chaosplane.io/v1alpha1
Kind: ChaosExperiment
Scope: Namespaced

Example

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: pod-kill-test
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: my-app
action:
type: pod-kill
parameters:
gracePeriodSeconds: "0"
duration: 30s
rollback:
enabled: false
execution:
parallelism: 1
steadyState:
before:
- name: pods-ready
type: k8s
k8s:
resource: pods
namespace: production
labelSelector: app=my-app
condition:
minReady: 3
after:
- name: pods-recovered
type: k8s
k8s:
resource: pods
namespace: production
labelSelector: app=my-app
condition:
minReady: 3
recoveryTimeout: 5m
abortConditions:
- name: error-rate-spike
type: prometheus
prometheus:
url: http://prometheus:9090
query: 'rate(http_requests_total{status=~"5.."}[1m])'
condition:
operator: ">"
threshold: 0.1
action: abort

Spec fields

spec.target

Defines which resources to target.

FieldTypeRequiredDescription
kindstringYesPod or Node
namespacestringYes (for pods)Target namespace
labelSelectorLabelSelectorNo*Kubernetes label selector
names[]stringNo*Explicit resource names

*Either labelSelector or names must be specified.

spec.action

Defines the chaos action to execute.

FieldTypeRequiredDescription
typestringYesAction type (e.g. pod-kill, network-delay)
parametersRawExtensionNoAction-specific parameters as key-value pairs
durationDurationNoOverride for action duration

spec.duration

How long the chaos action runs. Uses Go duration format: 30s, 5m, 1h.

spec.rollback

FieldTypeRequiredDefaultDescription
enabledboolYesWhether to roll back after the experiment
timeoutDurationNo5mMax time to wait for rollback to complete

spec.execution

FieldTypeRequiredDefaultDescription
parallelismint32NounlimitedMax number of targets to affect concurrently

spec.steadyState

FieldTypeRequiredDescription
before[]ProbeSpecNoProbes run before chaos injection
after[]ProbeSpecNoProbes run after chaos injection
recoveryTimeoutDurationNoHow long to wait for after probes to pass

spec.abortConditions

List of AbortConditionSpec. Each condition is evaluated continuously during the Running phase.

FieldTypeRequiredDescription
namestringYesCondition name
typeProbeTypeYesprometheus, http, or k8s
prometheusPrometheusProbeNoPrometheus probe config
httpHTTPProbeNoHTTP probe config
k8sK8sProbeNoKubernetes probe config
actionAbortActionYesabort, pause, or rollback

ProbeSpec

FieldTypeRequiredDescription
namestringYesProbe name
typeProbeTypeYesprometheus, http, or k8s
prometheusPrometheusProbeNoPrometheus probe config
httpHTTPProbeNoHTTP probe config
k8sK8sProbeNoKubernetes probe config

PrometheusProbe

FieldTypeRequiredDescription
urlstringYesPrometheus base URL
querystringYesPromQL query
condition.operatorstringYes<, >, <=, >=, ==, !=
condition.thresholdfloat64YesNumeric threshold

HTTPProbe

FieldTypeRequiredDescription
urlstringYesURL to request
methodstringNoHTTP method (default: GET)
expectedStatusintNoExpected status code (default: 200)
expectedBodystringNoString that must appear in response body

K8sProbe

FieldTypeRequiredDescription
resourcestringYesResource type: pods, deployments, nodes
namespacestringNoNamespace to query
labelSelectorstringNoLabel selector string
fieldSelectorstringNoField selector string
condition.minReadyintYesMinimum number of ready resources

Status fields

FieldTypeDescription
phaseExperimentPhaseCurrent phase
startTimeTimeWhen the experiment started
endTimeTimeWhen the experiment ended
recoveryStartTimeTimeWhen recovery phase started
conditions[]ConditionStandard Kubernetes conditions
observedGenerationint64Last reconciled generation
affectedResources[]stringNames of targeted resources
messagestringHuman-readable status message

Phases

PhaseDescription
PendingCreated, waiting to start
SteadyStateCheckingRunning before probes
RunningChaos is active
CompletingDuration elapsed, wrapping up
RecoveringRunning after probes
CompletedAll probes passed
FailedA probe failed or error occurred
AbortedManually aborted or abort condition triggered

Printer columns

NAME PHASE TARGET ACTION AGE
pod-kill-test Completed Pod pod-kill 5m