Deploy Chaos Toolkit as a Kubernetes Operator¶
Kubernetes operators are a popular approach to create bespoke controllers of any application on top of the Kubernetes API.
The Chaos Toolkit operator listens for experiment declarations and triggers a new Kubernetes pod, running the Chaos Toolkit with the specified experiment.
Deploy the operator¶
The operator can be found on the Chaos Toolkit incubator.
It is deployed via typical Kubernetes manifests which need to be applied via Kustomize, the native configuration manager.
First, download the Kustomize binary:
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
For macOS, you can also install it via the Homebrew package manager:
brew install kustomize
Next, simply run the following:
kustomize build manifests/overlays/generic-rbac | kubectl apply -f -
This will build the manifests and apply them on your current default cluster. Notice how we use the RBAC variant of the deployment. If you have other requirements (no-RBAC, pod security or network policies), then check the operator’s documentation to deploy the appropriate variant.
You can install another variant as follows:
kustomize build manifests/overlays/generic[-rbac[-podsec[-netsec]]] | kubectl apply -f -
By now, you should have the operator running in the chaostoolkit-crd
.
kubectl -n chaostoolkit-crd get pods
NAME READY STATUS RESTARTS AGE
chaostoolkit-crd-7ddb9b78d9-dgxx7 1/1 Running 0 35s
What the operator creates & deletes¶
The operator deployment creates two namespaces, by default: - the chaostoolkit-crd
namespace contains the operator pod and Chaos Toolkit experiment definitions - the chaostoolkit-run
namespace contains pods running the Chaos Toolkit experiments
When you apply an experiment object, the following other objects are created in the chaostoolkit-run
namespace:
- a Service Account specific to the pod
- a Pod running the Chaos Toolkit based on the image you indicated
- a role and a binding with enough permissions to handle pods inside the
chaostoolkit-run
namespace itself - a config map specific to that experiment with the experiment payload you gave
- a config mapo with environment variables specific to that experiment
On top of that, if you setup a schedule, a cron job object is created too.
In all cases, when you delete the experiment, all these objects are also deleted.
Options¶
All the options are hanging under the spec
element of the ChaosToolkitExperiment
kind object.
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
...
None of these options are required.
Option Path | Description | Value Type | Default Value |
---|---|---|---|
namespace: | Namespace where to create the experiment objects | string | chaostoolkit-run |
verbose: | Set the highest verbosity of the Chaos Toolkit experiment run | boolean | false |
serviceaccount: | Name of the service account to attach to the experiment pod | string | chaostoolkit |
role: | Name of the role to attach to the experiment pod | string | chaostoolkit-experiment |
role: | Name of the rolebinding to attach to the experiment pod | string | chaostoolkit-experiment |
role: | List of namespaces to add the role and its binding to | list[string] | [] |
pod: | Name of the config map to attach to the experiment pod | string | chaostoolkit-env |
pod: | Name of if the image to use for the pod | string | chaostoolkit/chaostoolkit:latest |
pod: | Do we mount environment variables from the config map into the pod? | boolean | true |
pod: | Mount the secrets values from this secret as environment variables | string | "" |
pod: | Should we mount settings as a file to the pod | boolean | false |
pod: | Mount the given secret holding Chaos Toolkit settings as a file to the pod | string | chaostoolkit-settings |
pod: | Mount the experiment’s payload as file (if true ) or from a URL | boolean | true |
pod: | Name of the config map holding the experiment’s payload | string | chaostoolkit-experiment |
pod: | Name of experiment file mounted into the container | string | experiment.json |
pod: | Replace the default pod’s command entrypoint with this one | string | "" |
pod: | Replace the default pod’s arguments with these ones | list[string] | [] |
schedule: | Cron kind (only CronJob supported) | string | cronjob |
schedule: | Cron-like schedule syntax | string | "" |
Run an experiment¶
Now that your controller is listening, you can ask it to schedule a Chaos Toolkit experiment by applying a resource with the following API:
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
Below is a basic example, assuming a file named basic.yaml
:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: chaostoolkit-experiment
namespace: chaostoolkit-run
data:
experiment.json: |
{
"title": "Hello world!",
"description": "Say hello world.",
"method": [
{
"type": "action",
"name": "say-hello",
"provider": {
"type": "process",
"path": "echo",
"arguments": "hello"
}
}
]
}
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
First, we will use the default namespace in which the Chaos Toolkit will run.
Then, we need a config map to pass the experiment to execute.
Finally, we simply create a ChaosToolkitExperiment
object that the controller picks up and understand as a new experiment to run in its own pod.
Apply it as follows:
kubectl apply -f basic.yaml
Then, you can check the Chaos Toolkit experiment has been registered, and will be scheduled to run as soon as possible:
kubectl -n chaostoolkit-crd get ctks
Look at the Chaos Toolkit running:
kubectl -n chaostoolkit-run get pods
The status of the experiment’s run, if it deviated, defines the status if the pod. So, when the experiment does deviate, the pod should have a status set to Error
. Otherwise, the status will be Completed
.
Manage the Chaos Toolkit Experiments¶
List and inspect experiments¶
You can list your experiments as follows:
kubectl -n chaostoolkit-crd get chaosexperiments
You can describe one experiment as follows:
kubectl -n chaostoolkit-crd describe chaosexperiment my-chaos-exp
You can also use the short names for the custom resource ctks
and ctk
.
Delete the experiment run’s resources¶
You can delete an experiment and its related resources as follows:
kubectl -n chaostoolkit-crd delete ctk my-chaos-exp
However, the custom resources (ConfigMap, Secrets, etc.) won’t be deleted. This command only deletes the resources that the operator creates for the experiment to be able to run.
To delete all the run’s resources, simply delete the objects as follows:
kubectl delete -f basic.yaml
Various configurations¶
You may decide to change various aspects of the final pod (such as passing settings as secrets, changing the roles allowed to the pod, even override the entire pod template).
Make the operator more verbose¶
By default, the operator logs at INFO level. To enable the DEBUG level, you need to change the operator’s deployment command:
In the file manifests/base/common/deployment.yaml
:
Change:
- name: crd
image: chaostoolkit/k8scrd:latest
imagePullPolicy: Always
to:
- name: crd
image: chaostoolkit/k8scrd:latest
imagePullPolicy: Always
command:
- kopf
args:
- run
- --verbose
- --namespace
- chaostoolkit-crd
- controller.py
Then re-deploy using Kustomize.
Configure the toolkit with environment variables¶
Chaos Toolkit experiments often expect data to be passed as environment variables of the chaos
’s command shell.
The operator allows you to specify those values through the config map:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: chaostoolkit-env
namespace: chaostoolkit-run
data:
NAME: "Jane Doe"
They will be injected into the Chaos Toolkit’s pod as environment variables.
You might need several environment config maps for various experiments. You can tell the operator where to find the config map to be loaded as environment variables.
We’ll assume you defined another config map named my-chaos-env-vars
. You can use it by setting the configMapName
in the env
block of the pod spec:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
pod:
env:
configMapName: my-chaos-env-vars
You can disable loading environment variables into the pod by using the enabled
property:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
pod:
env:
enabled: false
Plain text environment variables might not be secure enough in some use cases, such as database user name & passord, API keys, tokens, etc. You can define multiple encrypted key-value pairs in a Kubernetes secret and load them as environment variables. To to so, you shall indicate the name of the secret with the secretName
property.
Assuming you created a generic secret named chaostoolkit-secrets
, you can load the values as shown below:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
pod:
env:
secretName: chaostoolkit-secrets
All the key-value pairs from the secret will be injected into the Chaos Toolkit’s pod as environment variables.
Handle multiple experiment files¶
In the basic example, the name of the config map holding the experiment is the default value chaostoolkit-experiment
. Usually, you’ll want a more unique name since you’ll probably run multiple experiments from the chaostoolkit-run
namespace.
In that case, do it as follows:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: chaostoolkit-experiment-1234
namespace: chaostoolkit-run
data:
experiment.json: |
{
"title": "...",
}
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
pod:
experiment:
configMapName: chaostoolkit-experiment-1234
You need to define the configMapName
in the experiment
block of the pod spec.
Use the experiment in YAML format¶
If your experiments are encoded using YAML, you can set it as follows:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: chaostoolkit-experiment-1234
namespace: chaostoolkit-run
data:
experiment.yaml: |
---
title: "..."
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
pod:
experiment:
configMapName: chaostoolkit-experiment-1234
configMapExperimentFileName: experiment.yaml
Load the experiment from a URL¶
By default, the experiment is read from a file. But you may store it remotely e.g. GitHub and have it available over HTTP. You might want to load it from its remote URL instead.
You can tell the Chaos Toolkit to load it from a remote URL rather than from a local file, as follows:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: chaostoolkit-env
namespace: chaostoolkit-run
data:
EXPERIMENT_URL: "https://example.com/experiment.json"
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
pod:
experiment:
asFile: false
First, you need to pass the EXPERIMENT_URL
environment variable.
Then, tell the operator not to mount the default experiment volume. To do so, you need to set asFile
to false
in the experiment
block of the pod spec.
Run experiments in another namespace¶
You may create the namespace in which the resources will be deployed:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: my-other-namespace
You need to defined the namespace
value at the spec level.
If the namespace already exists, a message will be logged but this will not abort the operation.
However, this namespace will be entirely under your responsibility. No network nor pod securities will be managed in your namespace, if the operator was installed with those variants. You’ll need to manage them yourself.
Pass Chaos Toolkit settings as a Kubernetes secret¶
Chaos Toolkit reads its settings from a file and you can pass yours by creating a Kubernetes secret named, by default, chaostoolkit-settings
.
For instance, assuming you have a Chaos Toolkit settings file, you can create a secret from it as follows:
kubectl -n chaostoolkit-run \
create secret generic chaostoolkit-settings \
--from-file=settings.yaml=./settings.yaml
Note, the settings file must be named as settings.yaml
within the secret.
Reading settings is disabled by default, so you need to let the operator know it should allow it for that run:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
pod:
settings:
enabled: true
You need to set the variable enabled
to true
in the settings
block of the pod spec.
The default name for that secret is chaostoolkit-settings
but you can change it with the secretName
variable, as follows:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
pod:
settings:
enabled: true
secretName: my-super-secret
Keep generated resources even when the CRO is deleted¶
When you delete the ChaosToolkitExperiment
resource, all the allocated resources are deleted too (pod, service account, …). To prevent this, you may set the keep_resources_on_delete
property to true
at the spec level.
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
keep_resources_on_delete: true
In that case, you are responsible to cleanup all resources.
Pass your own role to bind to the service account¶
If your cluster has enabled RBAC, then the operator automatically binds a basic role to the service account associated with the chaostoolkit pod. That role allows your experiment to create/get/list/delete other pods in the same namespace.
You probably have more specific requirements, here is how to do it:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
role:
name: my-role
The property name
should be set to the name of the role you have created in the namespace which the experiment is executed in. The service account associated with the pod will be bound to that role.
Here is a more complexe sample:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-sa
namespace: chaostoolkit-run
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: my-role
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- "get"
- "delete"
- "list"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: my-role
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: my-role
subjects:
- kind: ServiceAccount
name: my-sa
namespace: chaostoolkit-run
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
serviceaccount:
name: my-sa
role:
name: my-role
bind: my-role
pod:
image: chaostoolkit/chaostoolkit:full
---
apiVersion: v1
kind: ConfigMap
metadata:
name: chaostoolkit-experiment
namespace: chaostoolkit-run
data:
experiment.json: |
{
"version": "1.0.0",
"title": "Terminate producer pod",
"description": "Simulate the restart of a pod",
"method": [
{
"name": "terminate-pods",
"type": "action",
"provider": {
"type": "python",
"module": "chaosk8s.pod.actions",
"func": "terminate_pods",
"arguments": {
"label_selector": "app=producer"
}
}
}
]
}
This creates a dedicated service account and binds a cluster role allowing the experiment to terminate a pod from the chaostoolkit-run
namespace to the default
namespace where the target pods reside. Note, we are using the chaostoolkit/chaostoolkit:full
which embeds the chaostoolkit-kubernetes
extension.
Override the default chaos path¶
The pod template executes the chaos run
command by default. If you run with your own base image, you may have a different location for the chaos
command.
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
pod:
chaosCommandPath: /some/path/chaos
Override the default chaos command arguments¶
The pod template executes the chaos run
command by default. You may want to extends or change the sub-command to execute when running the pod. You can define the chaos
arguments as follow:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
pod:
chaosArgs:
- --verbose
- run
- --dry
- $(EXPERIMENT_PATH)
You need to set the list of arguments in the chaosArgs
variable at pod spec level.
Label your Chaos Toolkit experiment¶
Experiment labels can be defined in the ChaosToolkitExperiment
’s metadata. All labels will be forwarded, if not already defined, in the pod running the experiment.
You can define labels as follow:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
labels:
environment: staging
tier: backend
target: database
These labels can then be used as selectors.
Allow network traffic for Chaos Toolkit experiments¶
When the operator is installed with the network security variant, the chaostoolkit
pod has limited network access. The pod is, by default, isolated for ingress connectivity and is limited to only DNS lookup & HTTPS for external traffic.
To allow the pod for other access, you may create another network policy within the chaostoolkit-run
namespace for pods matching the app: chaostoolkit
label:
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: my-custom-network-policy
namespace: chaostoolkit-run
spec:
podSelector:
matchLabels:
app: chaostoolkit
Run periodic and recurring experiments¶
The operator supports crontab
schedule for running Chaos Toolkit experiments periodically on a given schedule.
To do so, you can define a .spec.schedule
section, as follow:
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
schedule:
kind: cronJob
value: "*/1 * * * *"
This example runs a Chaos Toolkit experiment every minute.
You can list your scheduled experiments with the kubernetes’ cronjob
resource:
kubectl -n chaostoolkit-run get cronjobs
Run an experiment with specific extensions¶
The default container image used by the operator is the official Chaos Toolkit image which embeds no Chaos Toolkit extensions.
This means that you will likely need to create your bespoke container image. For instance, to install the Chaos Toolkit Kubernetes extension, create a Dockerfile like this:
FROM chaostoolkit/chaostoolkit
USER root
RUN apk update && \
apk add --virtual build-deps libffi-dev openssl-dev gcc python3-dev \
musl-dev && \
pip install --no-cache-dir chaostoolkit-addons chaostoolkit-reliably && \
apk del build-deps
USER 1001
Then create the image with docker:
docker build --tag my/chaostoolkit -f ./Dockerfile .
or, something such as Podman:
podman build --tag my/chaostoolkit -f ./Dockerfile
You can check your image contains the installed extensions as follows:
docker run --rm -it my/chaostoolkit info extensions
Once this image is pushed to any registry you can access, you need to let the operator know it must use it.
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
name: my-chaos-exp
namespace: chaostoolkit-crd
spec:
namespace: chaostoolkit-run
pod:
image: my/chaostoolkit
Tip
Note that the first time the job will create a pod will be at the end of the first period.
Uninstall the operator¶
To uninstall the operator and its own resources, simply run the following command for the overlay that is deployed.
kustomize build manifests/overlays/generic[-rbac[-podsec[-netsec]]] | kubectl delete -f -