Skip to content

Running Chaos Toolkit from an EC2 instance

It is common when using AWS for hosting your infrastructure that you’ll have strict security policies in place. These policies will usually only allow for internal traffic within AWS, amongst various other things. A question we’re asked a lot is can I run Chaos Toolkit from AWS, to run against AWS?. The answer is simply, yes, you can.

Why EC2?

The reasons for providing a guide on running Chaos Toolkit from an EC2 instance are simple enough:

  • Most AWS users are comfortable with EC2
  • It is the most analogous service to running something on your own workstation

The Steps

There are a few pre-requisites required to be able to follow this guide:

  • You’ll need access to the AWS Console (We’re assuming you’re comfortable here)
    • Or you’ll need AWS CLI installed and configured
  • You’ll need to be able to create EC2 instances (Or have someone do this for you)
  • You’ll need to be able to create IAM Roles and Policies (Or have someone do this for you)
  • You’ll need to be able to use Systems Manager - Session Manager

1. Create your instance

  • Navigate to the EC2 console and select Launch Instance
  • For this guide, we’ll select the Amazon Linux 2 AMI at the top of the list
  • For this guide, we’ll select a t2.micro (But you can choose a larger one)
  • Go onto Configure Instance Details
  • Select the VPC to deploy into via the Network dropdown
  • Select the Subnet to deploy into via the Subnet dropdown
  • To the right of IAM role, select Create a new IAM role
    • Create an instance profile as per the Creating an instance profile with minimal Session Manager permissions (console) in this Session Manager Documentation
    • Go back to the EC2 wizard and select the newly created role in the dropdown (You may have to click the refresh button)
  • Go onto Add Storage - For now, the defaults will be fine
  • Go onto Add Tags - We recommend at minimum, adding a tag {"OWNER": "your-name"}
  • Go onto Configure Security Group
  • Click the X to the right of the SSH rule, you won’t need this
  • Go onto Review and Launch - Select Launch
  • Select Proceed without a key pair, check the tickbox, and click Launch Instances

To be able to connect to your instance via Session Manager, you’ll first need to create a few IAM components.

  • Create a file named assume-role.json with the following contents:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
            }
        ]
    }
    

  • Create your instances IAM instance profile:

    aws iam create-instance-profile \
        --instance-profile-name CTK-EC2-INSTANCE-PROFILE \
        --no-cli-pager
    

  • Create your instances IAM role, replacing YOUR_NAME with your name:

    aws iam create-role \
        --role-name CTK-EC2-ROLE \
        --assume-role-policy-document file://assume-role.json \
        --tags Key=OWNER,Value=YOUR_NAME \
        --no-cli-pager
    

  • Create a file named role-policy.json with the following contents:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ssm:UpdateInstanceInformation",
                    "ssmmessages:CreateControlChannel",
                    "ssmmessages:CreateDataChannel",
                    "ssmmessages:OpenControlChannel",
                    "ssmmessages:OpenDataChannel"
                ],
                "Resource": "*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:GetEncryptionConfiguration"
                ],
                "Resource": "*"
            }
        ]
    }
    

  • Create the policy, replacing YOUR_NAME with your name:

    aws iam create-policy \
        --policy-name CTK-EC2-SESSION-MANAGER-POLICY \
        --policy-document file://role-policy.json \
        --tags Key=OWNER,Value=YOUR_NAME \
        --no-cli-pager
    

  • Attach your policy to your role, replacing AWS_ACCOUNT_ID with your AWS account id:

    aws iam attach-role-policy \
        --role-name CTK-EC2-ROLE \
        --policy-arn arn:aws:iam::AWS_ACCOUNT_ID:policy/CTK-EC2-SESSION-MANAGER-POLICY \
        --no-cli-pager
    

  • Attach your role to your instance profile:

    aws iam add-role-to-instance-profile \
        --instance-profile-name CTK-EC2-INSTANCE-PROFILE \
        --role-name CTK-EC2-ROLE \
        --no-cli-pager
    

Now that your instances IAM entities are sorted, you can create your instance.

  • Create your instance, using the instance profile created earlier, replacing YOUR_NAME with your name:
    aws ec2 run-instances \
        --image-id ami-0d26eb3972b7f8c96 \
        --instance-type t2.micro \
        --iam-instance-profile Name=CTK-EC2-INSTANCE-PROFILE \
        --count 1 \
        --tag-specifications 'ResourceType=instance,Tags=[{Key=OWNER,Value=YOUR_NAME}]' \
        --no-cli-pager
    

2. Connect to your instance

  • Navigate to the Systems Manager console and select Session Manager
  • Select Start Session
  • Select your instance from the Target instances table
  • Select Start session
  • A new tab will open with a terminal session open in the browser

3. Setup Chaos Toolkit

You’ll see a prompt like:

sh-4.2$

Change to the home directory with:

cd ~

Create a new directory for your experimentation and navigate inside:

mkdir my-experiments && cd my-experiments

Create a virtual environment for the Chaos Toolkit dependencies:

python3 -m venv .venv && source .venv/bin/activate && python3 -m pip install --upgrade pip

Install chaostoolkit and its AWS extension chaostoolkit-aws:

pip3 install chaostoolkit chaostoolkit-aws

4. Create an experiment

For the purpose of this guide, we will just create an experiment with a Steady State Hypothesis that interrogates EC2 and checks that our current instance, is in the running state. We don’t have a method, we’re merely showing that we can talk to AWS from within AWS.

Create a file named experiment.json with the following contents:

{
    "title": "Running Chaos Toolkit from an EC2 instance",
    "description": "N/A",
    "tags": [],
    "steady-state-hypothesis": {
        "title": "Current EC2 is RUNNING",
        "probes": [
            {
                "type": "probe",
                "name": "instance_state",
                "provider": {
                    "type": "python",
                    "module": "chaosaws.ec2.probes",
                    "func": "instance_state",
                    "arguments": {
                        "state": "running",
                        "instance_ids": [
                            "<INSTANCE_ID>"
                        ],
                        "filters": []
                    }
                },
                "tolerance": true
            }
        ]
    },
    "method": [],
    "configuration": {
        "aws_region": "<REGION>"
    }
}

Replace the value of <INSTANCE_ID> with the value of the id of the current instance. Replace <REGION> with the name of the region the instance is deployed in.

You can then run the experiment with:

chaos run ./experiment.json
[2021-08-18 10:12:29 INFO] Validating the experiment's syntax
[2021-08-18 10:12:29 INFO] Experiment looks valid
[2021-08-18 10:12:29 INFO] Running experiment: Running Chaos Toolkit from an EC2 instance
[2021-08-18 10:12:29 INFO] Steady-state strategy: default
[2021-08-18 10:12:29 INFO] Rollbacks strategy: default
[2021-08-18 10:12:29 INFO] Steady state hypothesis: Current EC2 is RUNNING
[2021-08-18 10:12:29 INFO] Probe: instance_state
[2021-08-18 10:12:29 ERROR]   => failed: botocore.exceptions.ClientError: An error occurred
(UnauthorizedOperation) when calling the DescribeInstances operation: You are not authorized to
perform this operation.
[2021-08-18 10:12:29 WARNING] Probe terminated unexpectedly, so its tolerance could not be validated
[2021-08-18 10:12:29 CRITICAL] Steady state probe 'instance_state' is not in the given tolerance so
failing this experiment
[2021-08-18 10:12:29 INFO] Experiment ended with status: failed

You’ll notice the error you just received:

failed: botocore.exceptions.ClientError: An error occurred
(UnauthorizedOperation) when calling the DescribeInstances operation: You are not
authorized to perform this operation.

This is because your instance profile role you created earlier doesn’t have a suitable policy statement allowing you to describe EC2 instances.

Navigate to the IAM console and find the Policy you created earlier, add the following statement to it:

{
    "Effect": "Allow",
    "Action": [
        "ec2:DescribeInstance*"
    ],
    "Resource": "*"
}

Run your experiment again:

chaos run ./experiment.json
[2021-08-18 10:24:56 INFO] Validating the experiment's syntax
[2021-08-18 10:24:56 INFO] Experiment looks valid
[2021-08-18 10:24:56 INFO] Running experiment: Running Chaos Toolkit from an EC2
instance
[2021-08-18 10:24:56 INFO] Steady-state strategy: default
[2021-08-18 10:24:56 INFO] Rollbacks strategy: default
[2021-08-18 10:24:56 INFO] Steady state hypothesis: Current EC2 is RUNNING
[2021-08-18 10:24:56 INFO] Probe: instance_state
[2021-08-18 10:24:56 INFO] Steady state hypothesis is met!
[2021-08-18 10:24:56 INFO] Playing your experiment's method now...
[2021-08-18 10:24:56 INFO] No declared activities, let's move on.
[2021-08-18 10:24:56 INFO] Steady state hypothesis: Current EC2 is RUNNING
[2021-08-18 10:24:56 INFO] Probe: instance_state
[2021-08-18 10:24:56 INFO] Steady state hypothesis is met!
[2021-08-18 10:24:56 INFO] Let's rollback...
[2021-08-18 10:24:56 INFO] No declared rollbacks, let's move on.
[2021-08-18 10:24:56 INFO] Experiment ended with status: completed

As you’ll notice, your EC2 profile now has the suitable permissions. This should ultimately give you a good sense on how IAM allows you to give specific permissions to the instances running your Chaos Toolkit experiments.

Summary

Whilst the experiment within this guide was simple, the guide was not meant to teach you how to write experiments. The purpose of the guide was to show you how you might run Chaos Toolkit from AWS to interact with your AWS infrastructure.

You should now have an appreciation and the ability to:

  • Create an EC2 instance within your AWS network
  • Securely connect to that instance via Session Manager
    • Negating the need for Security Group policies or SSH access
  • Setup and run Chaos Toolkit from an EC2 instance
  • Modify the IAM policies for your instance to increase/decrease the experiments ability to interact with your systems.

Notes

It should be noted that several things could be done differently in this guide to suit your own setup, they could be as follows:

  • Using a containerised setup like prescribed in this guide within your instance
  • Using a git repository (whether pulled or created) to use version control on the instance and keep your experiments in version control
  • Storing experiments/experiment journals/experiment logs in S3 so they’re accessible to others in your organisation
  • Connecting via SSH (if your organisation is less concerned about allowing traffic from your local IP)