Skip to main content

Blast Radius Policies

A BlastRadiusPolicy is a cluster-scoped guardrail that limits what chaos experiments can do. It evaluates in a 7-step chain before any experiment runs. If any step rejects the experiment, it's blocked (in Enforce mode) or logged (in Audit mode).

Why policies matter

Without guardrails, a misconfigured experiment could target production namespaces, kill too many pods at once, or run at 3am. Policies let platform teams define safe boundaries while giving developers freedom to experiment within them.

The 7-step evaluation chain

  1. Namespace scope - Is the target namespace in the allowed list?
  2. Label scope - Does the target match the label selector?
  3. Action type - Is the action type in the allowed list?
  4. Max targets - Does the experiment exceed the absolute target count?
  5. Max percentage - Does the experiment exceed the percentage limit?
  6. Time windows - Is the current time in an allowed window (or blocked window)?
  7. Audit mode - If Audit, log and allow; if Enforce, block

Basic policy

apiVersion: chaos.chaosplane.io/v1alpha1
kind: BlastRadiusPolicy
metadata:
name: production-guardrails
spec:
enforcement: Enforce

scope:
namespaces:
- production
- staging

targetLimits:
maxTargets: 2
maxPercentage: 20

protectedResources:
namespaces:
- kube-system
- chaosplane
labels:
chaosplane.io/protected: "true"
names:
- kind: Deployment
name: payment-service
namespace: production

actionLimits:
allowedActions:
- pod-kill
- network-delay
- pod-cpu-stress
maxDuration: 10m

timeWindows:
allowed:
- name: business-hours
schedule: "0 9 * * 1-5"
duration: 8h
timezone: UTC
blocked:
- name: peak-traffic
schedule: "0 18 * * 1-5"
duration: 2h
timezone: America/New_York

Enforcement modes

Enforce

Experiments that violate the policy are rejected. The experiment moves to Failed phase with a message explaining which policy step blocked it.

spec:
enforcement: Enforce

Audit

Violations are logged but experiments are allowed to proceed. Use this when rolling out a new policy to understand its impact before enforcing it.

spec:
enforcement: Audit

Scope

The scope field defines which experiments this policy applies to. A policy only evaluates experiments targeting resources within its scope.

spec:
scope:
namespaces:
- production
labelSelector:
matchLabels:
environment: production

If namespaces is empty, the policy applies to all namespaces. If labelSelector is empty, it applies to all labels.

Target limits

Limit how many resources a single experiment can affect:

spec:
targetLimits:
maxTargets: 3 # absolute maximum
maxPercentage: 25 # percentage of matching resources

Both limits are evaluated. The experiment is blocked if it would exceed either one.

Protected resources

Resources that can never be targeted, regardless of the experiment spec:

spec:
protectedResources:
# Entire namespaces
namespaces:
- kube-system
- monitoring

# Resources with specific labels
labels:
chaosplane.io/protected: "true"
tier: database

# Specific named resources
names:
- kind: Pod
name: critical-singleton
namespace: production
- kind: Node
name: control-plane-1

Action limits

Restrict which action types are allowed and cap experiment duration:

spec:
actionLimits:
allowedActions:
- pod-kill
- network-delay
- pod-cpu-stress
maxDuration: 5m

If allowedActions is empty, all actions are permitted. maxDuration applies to the experiment's spec.duration.

Time windows

Control when experiments can run using cron expressions.

Allowed windows

Experiments can only run during these windows:

spec:
timeWindows:
allowed:
- name: business-hours-utc
schedule: "0 9 * * 1-5" # 9am Monday-Friday
duration: 8h
timezone: UTC
- name: weekend-testing
schedule: "0 10 * * 6" # 10am Saturday
duration: 4h
timezone: America/Los_Angeles

Blocked windows

Experiments are blocked during these windows (takes precedence over allowed):

spec:
timeWindows:
blocked:
- name: deployment-freeze
schedule: "0 17 * * 5" # 5pm Friday
duration: 64h # through Sunday night
timezone: UTC
- name: peak-hours
schedule: "0 12 * * 1-5" # noon weekdays
duration: 2h
timezone: America/New_York

The schedule field uses standard 5-field cron syntax: minute hour day-of-month month day-of-week.

Multiple policies

Multiple policies can apply to the same experiment. All policies are evaluated, and the experiment is blocked if any one of them rejects it.

Audit-first workflow

The recommended approach for new policies:

  1. Deploy in Audit mode
  2. Run experiments and observe which ones would be blocked
  3. Adjust the policy as needed
  4. Switch to Enforce mode
# Check policy evaluation results in audit mode
chaosctl events -n production | grep BlastRadiusPolicy

Example: development vs production

# Permissive policy for development
---
apiVersion: chaos.chaosplane.io/v1alpha1
kind: BlastRadiusPolicy
metadata:
name: dev-policy
spec:
enforcement: Audit
scope:
namespaces: [development, staging]
targetLimits:
maxTargets: 10
maxPercentage: 50

---
# Strict policy for production
apiVersion: chaos.chaosplane.io/v1alpha1
kind: BlastRadiusPolicy
metadata:
name: prod-policy
spec:
enforcement: Enforce
scope:
namespaces: [production]
targetLimits:
maxTargets: 1
maxPercentage: 10
protectedResources:
namespaces: [kube-system]
labels:
tier: database
actionLimits:
allowedActions: [pod-kill, network-delay]
maxDuration: 5m
timeWindows:
allowed:
- name: chaos-hours
schedule: "0 10 * * 2-4"
duration: 4h
timezone: UTC