Blast Radius Policies

A BlastRadiusPolicy is a cluster-scoped guardrail that limits what chaos experiments can do. It evaluates in a 7-step chain before any experiment runs. If any step rejects the experiment, it's blocked (in Enforce mode) or logged (in Audit mode).

Why policies matter

Without guardrails, a misconfigured experiment could target production namespaces, kill too many pods at once, or run at 3am. Policies let platform teams define safe boundaries while giving developers freedom to experiment within them.

The 7-step evaluation chain

Namespace scope - Is the target namespace in the allowed list?
Label scope - Does the target match the label selector?
Action type - Is the action type in the allowed list?
Max targets - Does the experiment exceed the absolute target count?
Max percentage - Does the experiment exceed the percentage limit?
Time windows - Is the current time in an allowed window (or blocked window)?
Audit mode - If Audit, log and allow; if Enforce, block

Basic policy

apiVersion: chaos.chaosplane.io/v1alpha1
kind: BlastRadiusPolicy
metadata:
  name: production-guardrails
spec:
  enforcement: Enforce

  scope:
    namespaces:
      - production
      - staging

  targetLimits:
    maxTargets: 2
    maxPercentage: 20

  protectedResources:
    namespaces:
      - kube-system
      - chaosplane
    labels:
      chaosplane.io/protected: "true"
    names:
      - kind: Deployment
        name: payment-service
        namespace: production

  actionLimits:
    allowedActions:
      - pod-kill
      - network-delay
      - pod-cpu-stress
    maxDuration: 10m

  timeWindows:
    allowed:
      - name: business-hours
        schedule: "0 9 * * 1-5"
        duration: 8h
        timezone: UTC
    blocked:
      - name: peak-traffic
        schedule: "0 18 * * 1-5"
        duration: 2h
        timezone: America/New_York

Enforcement modes

Enforce

Experiments that violate the policy are rejected. The experiment moves to Failed phase with a message explaining which policy step blocked it.

spec:
  enforcement: Enforce

Audit

Violations are logged but experiments are allowed to proceed. Use this when rolling out a new policy to understand its impact before enforcing it.

spec:
  enforcement: Audit

Scope

The scope field defines which experiments this policy applies to. A policy only evaluates experiments targeting resources within its scope.

spec:
  scope:
    namespaces:
      - production
    labelSelector:
      matchLabels:
        environment: production

If namespaces is empty, the policy applies to all namespaces. If labelSelector is empty, it applies to all labels.

Target limits

Limit how many resources a single experiment can affect:

spec:
  targetLimits:
    maxTargets: 3        # absolute maximum
    maxPercentage: 25    # percentage of matching resources

Both limits are evaluated. The experiment is blocked if it would exceed either one.

Protected resources

Resources that can never be targeted, regardless of the experiment spec:

spec:
  protectedResources:
    # Entire namespaces
    namespaces:
      - kube-system
      - monitoring

    # Resources with specific labels
    labels:
      chaosplane.io/protected: "true"
      tier: database

    # Specific named resources
    names:
      - kind: Pod
        name: critical-singleton
        namespace: production
      - kind: Node
        name: control-plane-1

Action limits

Restrict which action types are allowed and cap experiment duration:

spec:
  actionLimits:
    allowedActions:
      - pod-kill
      - network-delay
      - pod-cpu-stress
    maxDuration: 5m

If allowedActions is empty, all actions are permitted. maxDuration applies to the experiment's spec.duration.

Time windows

Control when experiments can run using cron expressions.

Allowed windows

Experiments can only run during these windows:

spec:
  timeWindows:
    allowed:
      - name: business-hours-utc
        schedule: "0 9 * * 1-5"   # 9am Monday-Friday
        duration: 8h
        timezone: UTC
      - name: weekend-testing
        schedule: "0 10 * * 6"    # 10am Saturday
        duration: 4h
        timezone: America/Los_Angeles

Blocked windows

Experiments are blocked during these windows (takes precedence over allowed):

spec:
  timeWindows:
    blocked:
      - name: deployment-freeze
        schedule: "0 17 * * 5"    # 5pm Friday
        duration: 64h              # through Sunday night
        timezone: UTC
      - name: peak-hours
        schedule: "0 12 * * 1-5"  # noon weekdays
        duration: 2h
        timezone: America/New_York

The schedule field uses standard 5-field cron syntax: minute hour day-of-month month day-of-week.

Multiple policies

Multiple policies can apply to the same experiment. All policies are evaluated, and the experiment is blocked if any one of them rejects it.

Audit-first workflow

The recommended approach for new policies:

Deploy in Audit mode
Run experiments and observe which ones would be blocked
Adjust the policy as needed
Switch to Enforce mode

# Check policy evaluation results in audit mode
chaosctl events -n production | grep BlastRadiusPolicy

Example: development vs production

# Permissive policy for development
---
apiVersion: chaos.chaosplane.io/v1alpha1
kind: BlastRadiusPolicy
metadata:
  name: dev-policy
spec:
  enforcement: Audit
  scope:
    namespaces: [development, staging]
  targetLimits:
    maxTargets: 10
    maxPercentage: 50

---
# Strict policy for production
apiVersion: chaos.chaosplane.io/v1alpha1
kind: BlastRadiusPolicy
metadata:
  name: prod-policy
spec:
  enforcement: Enforce
  scope:
    namespaces: [production]
  targetLimits:
    maxTargets: 1
    maxPercentage: 10
  protectedResources:
    namespaces: [kube-system]
    labels:
      tier: database
  actionLimits:
    allowedActions: [pod-kill, network-delay]
    maxDuration: 5m
  timeWindows:
    allowed:
      - name: chaos-hours
        schedule: "0 10 * * 2-4"
        duration: 4h
        timezone: UTC

Why policies matter​

The 7-step evaluation chain​

Basic policy​

Enforcement modes​

Enforce​

Audit​

Scope​

Target limits​

Protected resources​

Action limits​

Time windows​

Allowed windows​

Blocked windows​

Multiple policies​

Audit-first workflow​

Example: development vs production​