Skip to content

Extension chaospixie

Version 0.1.1
Repository https://github.com/chaostoolkit-incubator/chaostoolkit-pixie

Version License Build, Test, and Lint Python versions

This extension allows you to run Pixie script during your experiments.

Install

This package requires Python 3.8+ as Pixie’s dependency requires it.

To be used from your experiment, this package must be installed in the Python environment where chaostoolkit already lives.

$ pip install chaostoolkit-pixie

Usage

This extension provides two probes to run Pixie scripts, either directly embedded into the experiment or in a file local to the experiment.

For instance, a complete script:

{
    "version": "1.0.0",
    "title": "Consumer service remains fast under higher traffic load",
    "description": "Showcase for how we remain responsive under a certain load. This should help us figure how many replicas we should run",
    "secrets": {
        "pixie": {
            "api_key": {
                "type": "env",
                "key": "PIXIE_API_KEY"
            }
        }
    },
    "configuration": {
        "pixie_cluster_id": {
            "type": "env",
            "key": "PIXIE_CLUSTER_ID"
        }
    },
    "steady-state-hypothesis": {
        "title": "Run a Pixie script and evaluate it",
        "probes": [
            {
                "type": "probe",
                "name": "p99-latency-of-consumer-service-for-past-2m-remained-under-300ms",
                "tolerance": {
                    "type": "probe",
                    "name": "compute-median",
                    "provider": {
                        "type": "python",
                        "module": "chaospixie.tolerances",
                        "func": "percentile_should_be_below",
                        "secrets": ["pixie"],
                        "arguments": {
                            "column": "latency_p99",
                            "percentile": 99,
                            "convert_from_nanoseconds": "milliseconds",
                            "treshold": 300.0
                        }
                    }
                },
                "provider": {
                    "type": "python",
                    "module": "chaospixie.probes",
                    "func": "run_script_from_local_file",
                    "secrets": ["pixie"],
                    "arguments": {
                        "script_path": "./pixiescript.py"
                    }
                }
            }
        ]
    },
    "method": [
        {
            "type": "action",
            "name": "send-10-requests-per-second-for-60s",
            "provider": {
                "type": "process",
                "path": "ddosify",
                "arguments": "-d 60 -n 600 -o stdout-json -t http://mydomain.com/consumer"
            }
        }
    ]
}

This assumes you have a a service named consumer. Pixie monitors its latency and produces percentiles for it. We then use a probe tolerance to evaluate the returned latency for the past 2 minutes and we measure if the latency was mainly (99-percentile) under 300ms.

In this example, we use ddosify to induce the load, but you can use your favourite tooling of course.

The Pixie script we run is as follows:

import px

ns_per_ms = 1000 * 1000
ns_per_s = 1000 * ns_per_ms
window_ns = px.DurationNanos(10 * ns_per_s)
filter_unresolved_inbound = True
filter_health_checks = True
filter_ready_checks = True


def inbound_let_timeseries(start_time: str, service: px.Service):
    ''' Compute the let as a timeseries for requests received by `service`.

    Args:
    @start_time: The timestamp of data to start at.
    @service: The name of the service to filter on.

    '''
    df = let_helper(start_time)
    df = df[px.has_service_name(df.service, service)]

    df = df.groupby(['timestamp']).agg(
        latency_quantiles=('latency', px.quantiles),
        error_rate_per_window=('failure', px.mean),
        throughput_total=('latency', px.count),
        bytes_total=('resp_body_size', px.sum)
    )

    # Format the result of LET aggregates into proper scalar formats and
    # time series.
    df.latency_p50 = px.DurationNanos(px.floor(px.pluck_float64(df.latency_quantiles, 'p50')))
    df.latency_p90 = px.DurationNanos(px.floor(px.pluck_float64(df.latency_quantiles, 'p90')))
    df.latency_p99 = px.DurationNanos(px.floor(px.pluck_float64(df.latency_quantiles, 'p99')))
    df.request_throughput = df.throughput_total / window_ns
    df.errors_per_ns = df.error_rate_per_window * df.request_throughput / px.DurationNanos(1)
    df.error_rate = px.Percent(df.error_rate_per_window)
    df.bytes_per_ns = df.bytes_total / window_ns
    df.time_ = df.timestamp

    return df[['time_', 'latency_p50', 'latency_p90', 'latency_p99',
               'request_throughput', 'errors_per_ns', 'error_rate', 'bytes_per_ns']]


def let_helper(start_time: str):
    ''' Compute the initial part of the let for requests.
        Filtering to inbound/outbound traffic by service is done by the calling function.

    Args:
    @start_time: The timestamp of data to start at.

    '''
    df = px.DataFrame(table='http_events', start_time=start_time)
    # Filter only to inbound service traffic (server-side).
    # Don't include traffic initiated by this service to an external location.
    df = df[df.trace_role == 2]
    df.service = df.ctx['service']
    df.pod = df.ctx['pod']
    df.latency = df.latency

    df.timestamp = px.bin(df.time_, window_ns)

    df.failure = df.resp_status >= 400
    filter_out_conds = ((df.req_path != '/healthz' or not filter_health_checks) and (
        df.req_path != '/readyz' or not filter_ready_checks)) and (
        df['remote_addr'] != '-' or not filter_unresolved_inbound)

    df = df[filter_out_conds]
    return df


df = inbound_let_timeseries("-2m", "default/consumer")
px.display(df)

This is an abridged script from Pixie itself.

That’s it!

Configuration

Test

To run the tests for the project execute the following:

$ pytest

Formatting and Linting

We use a combination of black, flake8, and isort to both lint and format this repositories code.

Before raising a Pull Request, we recommend you run formatting against your code with:

$ make format

This will automatically format any code that doesn’t adhere to the formatting standards.

As some things are not picked up by the formatting, we also recommend you run:

$ make lint

To ensure that any unused import statements/strings that are too long, etc. are also picked up.

Contribute

If you wish to contribute more functions to this package, you are more than welcome to do so. Please, fork this project, make your changes following the usual PEP 8 code style, sprinkling with tests and submit a PR for review.

Exported Activities

probes


run_script

Type probe
Module chaospixie.probes
Name run_script
Return string

Run a Pixie script.

Make sure to provide the name of the table you want to fetch data for. Usually it’s the name given to the px.display() function in your script.

Signature:

def run_script(script: str,
               table_name: str = 'output',
               configuration: Dict[str, Dict[str, str]] = None,
               secrets: Dict[str, Dict[str, str]] = None) -> str:
    pass

Arguments:

Name Type Default Required
script string Yes
table_name string “output” No

Usage:

{
  "name": "run-script",
  "type": "probe",
  "provider": {
    "type": "python",
    "module": "chaospixie.probes",
    "func": "run_script",
    "arguments": {
      "script": ""
    }
  }
}
name: run-script
provider:
  arguments:
    script: ''
  func: run_script
  module: chaospixie.probes
  type: python
type: probe

run_script_from_local_file

Type probe
Module chaospixie.probes
Name run_script_from_local_file
Return list

Run a Pixie script loaded from a local file.

Make sure to provide the name of the table you want to fetch data for. Usually it’s the name given to the px.display() function in your script.

Signature:

def run_script_from_local_file(
        script_path: str,
        table_name: str = 'output',
        configuration: Dict[str, Dict[str, str]] = None,
        secrets: Dict[str, Dict[str, str]] = None) -> List[Dict[str, Any]]:
    pass

Arguments:

Name Type Default Required
script_path string Yes
table_name string “output” No

Usage:

{
  "name": "run-script-from-local-file",
  "type": "probe",
  "provider": {
    "type": "python",
    "module": "chaospixie.probes",
    "func": "run_script_from_local_file",
    "arguments": {
      "script_path": ""
    }
  }
}
name: run-script-from-local-file
provider:
  arguments:
    script_path: ''
  func: run_script_from_local_file
  module: chaospixie.probes
  type: python
type: probe

tolerances


median_should_be_above

Type tolerance
Module chaospixie.tolerances
Name median_should_be_above
Return boolean

Compute the median of all the column in the list of results. If you need to limit the computation to a specific dataset within the results, you can provide the target as a tuple such as (key, value). The value can be a fixed value or a regular expression to match many.

Sometimes the column’s value type is in nanoseconds, which isn’t always easy to make sense of. You can set the convert_from_nanoseconds_to_seconds flag so we automatically convert to seconds the value. In that case, the threshold mus also be in seconds.

Return true if the median is above (or equal) to the threshold you provide.

Signature:

def median_should_be_above(column: str,
                           treshold: float,
                           target: Tuple[str, str] = None,
                           convert_from_nanoseconds: Literal[
                               'seconds', 'milliseconds',
                               'microseconds'] = None,
                           value: List[Dict[str, Any]] = None) -> bool:
    pass

Arguments:

Name Type Default Required
column string Yes
treshold number Yes
target object null No
convert_from_nanoseconds object null No
value list null No

Tolerances declare the value argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.

Usage:

{
  "steady-state-hypothesis": {
    "title": "...",
    "probes": [
      {
        "type": "probe",
        "tolerance": {
          "name": "median-should-be-above",
          "type": "tolerance",
          "provider": {
            "type": "python",
            "module": "chaospixie.tolerances",
            "func": "median_should_be_above",
            "arguments": {
              "column": "",
              "treshold": null
            }
          }
        },
        "...": "..."
      }
    ]
  }
}
steady-state-hypothesis:
  probes:
  - '...': '...'
    tolerance:
      name: median-should-be-above
      provider:
        arguments:
          column: ''
          treshold: null
        func: median_should_be_above
        module: chaospixie.tolerances
        type: python
      type: tolerance
    type: probe
  title: '...'

median_should_be_below

Type tolerance
Module chaospixie.tolerances
Name median_should_be_below
Return boolean

Compute the median of all the column in the list of results. If you need to limit the computation to a specific dataset within the results, you can provide the target as a tuple such as (key, value). The value can be a fixed value or a regular expression to match many.

Sometimes the column’s value type is in nanoseconds, which isn’t always easy to make sense of. You can set the convert_from_nanoseconds flag so we automatically convert to seconds the value. In that case, the threshold must also be in seconds. The convert_from_nanoseconds flag can be: "seconds", "milliseconds" or "microseconds".

Return true if the median is below (or equal) to the threshold you provide.

Signature:

def median_should_be_below(column: str,
                           treshold: float,
                           convert_from_nanoseconds: Literal[
                               'seconds', 'milliseconds',
                               'microseconds'] = None,
                           target: Tuple[str, str] = None,
                           value: List[Dict[str, Any]] = None) -> bool:
    pass

Arguments:

Name Type Default Required
column string Yes
treshold number Yes
convert_from_nanoseconds object null No
target object null No
value list null No

Tolerances declare the value argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.

Usage:

{
  "steady-state-hypothesis": {
    "title": "...",
    "probes": [
      {
        "type": "probe",
        "tolerance": {
          "name": "median-should-be-below",
          "type": "tolerance",
          "provider": {
            "type": "python",
            "module": "chaospixie.tolerances",
            "func": "median_should_be_below",
            "arguments": {
              "column": "",
              "treshold": null
            }
          }
        },
        "...": "..."
      }
    ]
  }
}
steady-state-hypothesis:
  probes:
  - '...': '...'
    tolerance:
      name: median-should-be-below
      provider:
        arguments:
          column: ''
          treshold: null
        func: median_should_be_below
        module: chaospixie.tolerances
        type: python
      type: tolerance
    type: probe
  title: '...'

percentile_should_be_above

Type tolerance
Module chaospixie.tolerances
Name percentile_should_be_above
Return boolean

Compute the percentiles of all the column in the list of results. The default returned percentile is the 99-percentile. If you need to limit the computation to a specific dataset within the results, you can provide the target as a tuple such as (key, value). The value can be a fixed value or a regular expression to match many.

Sometimes the column’s value type is in nanoseconds, which isn’t always easy to make sense of. You can set the convert_from_nanoseconds_to_seconds flag so we automatically convert to seconds the value. In that case, the threshold mus also be in seconds.

Return true if the percentile is above (or equal) to the threshold you provide.

Signature:

def percentile_should_be_above(column: str,
                               treshold: float,
                               percentile: int = 99,
                               target: Tuple[str, str] = None,
                               convert_from_nanoseconds: Literal[
                                   'seconds', 'milliseconds',
                                   'microseconds'] = None,
                               value: List[Dict[str, Any]] = None) -> bool:
    pass

Arguments:

Name Type Default Required
column string Yes
treshold number Yes
percentile integer 99 No
target object null No
convert_from_nanoseconds object null No
value list null No

Tolerances declare the value argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.

Usage:

{
  "steady-state-hypothesis": {
    "title": "...",
    "probes": [
      {
        "type": "probe",
        "tolerance": {
          "name": "percentile-should-be-above",
          "type": "tolerance",
          "provider": {
            "type": "python",
            "module": "chaospixie.tolerances",
            "func": "percentile_should_be_above",
            "arguments": {
              "column": "",
              "treshold": null
            }
          }
        },
        "...": "..."
      }
    ]
  }
}
steady-state-hypothesis:
  probes:
  - '...': '...'
    tolerance:
      name: percentile-should-be-above
      provider:
        arguments:
          column: ''
          treshold: null
        func: percentile_should_be_above
        module: chaospixie.tolerances
        type: python
      type: tolerance
    type: probe
  title: '...'

percentile_should_be_below

Type tolerance
Module chaospixie.tolerances
Name percentile_should_be_below
Return boolean

Compute the percentiles of all the column in the list of results. The default returned percentile is the 99-percentile. If you need to limit the computation to a specific dataset within the results, you can provide the target as a tuple such as (key, value). The value can be a fixed value or a regular expression to match many.

Sometimes the column’s value type is in nanoseconds, which isn’t always easy to make sense of. You can set the convert_from_nanoseconds_to_seconds flag so we automatically convert to seconds the value. In that case, the threshold mus also be in seconds.

Return true if the percentile is below (or equal) to the threshold you provide.

Signature:

def percentile_should_be_below(column: str,
                               treshold: float,
                               percentile: int = 99,
                               target: Tuple[str, str] = None,
                               convert_from_nanoseconds: Literal[
                                   'seconds', 'milliseconds',
                                   'microseconds'] = None,
                               value: List[Dict[str, Any]] = None) -> bool:
    pass

Arguments:

Name Type Default Required
column string Yes
treshold number Yes
percentile integer 99 No
target object null No
convert_from_nanoseconds object null No
value list null No

Tolerances declare the value argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.

Usage:

{
  "steady-state-hypothesis": {
    "title": "...",
    "probes": [
      {
        "type": "probe",
        "tolerance": {
          "name": "percentile-should-be-below",
          "type": "tolerance",
          "provider": {
            "type": "python",
            "module": "chaospixie.tolerances",
            "func": "percentile_should_be_below",
            "arguments": {
              "column": "",
              "treshold": null
            }
          }
        },
        "...": "..."
      }
    ]
  }
}
steady-state-hypothesis:
  probes:
  - '...': '...'
    tolerance:
      name: percentile-should-be-below
      provider:
        arguments:
          column: ''
          treshold: null
        func: percentile_should_be_below
        module: chaospixie.tolerances
        type: python
      type: tolerance
    type: probe
  title: '...'