ChaosExperiment
ChaosExperiment is the core resource. It defines a single chaos injection: what to target, what action to run, how long to run it, and how to verify the system's health before and after.
Group: chaos.chaosplane.io/v1alpha1
Kind: ChaosExperiment
Scope: Namespaced
Example
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: pod-kill-test
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: my-app
action:
type: pod-kill
parameters:
gracePeriodSeconds: "0"
duration: 30s
rollback:
enabled: false
execution:
parallelism: 1
steadyState:
before:
- name: pods-ready
type: k8s
k8s:
resource: pods
namespace: production
labelSelector: app=my-app
condition:
minReady: 3
after:
- name: pods-recovered
type: k8s
k8s:
resource: pods
namespace: production
labelSelector: app=my-app
condition:
minReady: 3
recoveryTimeout: 5m
abortConditions:
- name: error-rate-spike
type: prometheus
prometheus:
url: http://prometheus:9090
query: 'rate(http_requests_total{status=~"5.."}[1m])'
condition:
operator: ">"
threshold: 0.1
action: abort
Spec fields
spec.target
Defines which resources to target.
| Field | Type | Required | Description |
|---|---|---|---|
kind | string | Yes | Pod or Node |
namespace | string | Yes (for pods) | Target namespace |
labelSelector | LabelSelector | No* | Kubernetes label selector |
names | []string | No* | Explicit resource names |
*Either labelSelector or names must be specified.
spec.action
Defines the chaos action to execute.
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Action type (e.g. pod-kill, network-delay) |
parameters | RawExtension | No | Action-specific parameters as key-value pairs |
duration | Duration | No | Override for action duration |
spec.duration
How long the chaos action runs. Uses Go duration format: 30s, 5m, 1h.
spec.rollback
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | bool | Yes | — | Whether to roll back after the experiment |
timeout | Duration | No | 5m | Max time to wait for rollback to complete |
spec.execution
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
parallelism | int32 | No | unlimited | Max number of targets to affect concurrently |
spec.steadyState
| Field | Type | Required | Description |
|---|---|---|---|
before | []ProbeSpec | No | Probes run before chaos injection |
after | []ProbeSpec | No | Probes run after chaos injection |
recoveryTimeout | Duration | No | How long to wait for after probes to pass |
spec.abortConditions
List of AbortConditionSpec. Each condition is evaluated continuously during the Running phase.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Condition name |
type | ProbeType | Yes | prometheus, http, or k8s |
prometheus | PrometheusProbe | No | Prometheus probe config |
http | HTTPProbe | No | HTTP probe config |
k8s | K8sProbe | No | Kubernetes probe config |
action | AbortAction | Yes | abort, pause, or rollback |
ProbeSpec
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Probe name |
type | ProbeType | Yes | prometheus, http, or k8s |
prometheus | PrometheusProbe | No | Prometheus probe config |
http | HTTPProbe | No | HTTP probe config |
k8s | K8sProbe | No | Kubernetes probe config |
PrometheusProbe
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Prometheus base URL |
query | string | Yes | PromQL query |
condition.operator | string | Yes | <, >, <=, >=, ==, != |
condition.threshold | float64 | Yes | Numeric threshold |
HTTPProbe
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to request |
method | string | No | HTTP method (default: GET) |
expectedStatus | int | No | Expected status code (default: 200) |
expectedBody | string | No | String that must appear in response body |
K8sProbe
| Field | Type | Required | Description |
|---|---|---|---|
resource | string | Yes | Resource type: pods, deployments, nodes |
namespace | string | No | Namespace to query |
labelSelector | string | No | Label selector string |
fieldSelector | string | No | Field selector string |
condition.minReady | int | Yes | Minimum number of ready resources |
Status fields
| Field | Type | Description |
|---|---|---|
phase | ExperimentPhase | Current phase |
startTime | Time | When the experiment started |
endTime | Time | When the experiment ended |
recoveryStartTime | Time | When recovery phase started |
conditions | []Condition | Standard Kubernetes conditions |
observedGeneration | int64 | Last reconciled generation |
affectedResources | []string | Names of targeted resources |
message | string | Human-readable status message |
Phases
| Phase | Description |
|---|---|
Pending | Created, waiting to start |
SteadyStateChecking | Running before probes |
Running | Chaos is active |
Completing | Duration elapsed, wrapping up |
Recovering | Running after probes |
Completed | All probes passed |
Failed | A probe failed or error occurred |
Aborted | Manually aborted or abort condition triggered |
Printer columns
NAME PHASE TARGET ACTION AGE
pod-kill-test Completed Pod pod-kill 5m