Extension chaospixie
¶
Version | 0.1.1 |
Repository | https://github.com/chaostoolkit-incubator/chaostoolkit-pixie |
This extension allows you to run Pixie script during your experiments.
Install¶
This package requires Python 3.8+ as Pixie’s dependency requires it.
To be used from your experiment, this package must be installed in the Python environment where chaostoolkit already lives.
$ pip install chaostoolkit-pixie
Usage¶
This extension provides two probes to run Pixie scripts, either directly embedded into the experiment or in a file local to the experiment.
For instance, a complete script:
{
"version": "1.0.0",
"title": "Consumer service remains fast under higher traffic load",
"description": "Showcase for how we remain responsive under a certain load. This should help us figure how many replicas we should run",
"secrets": {
"pixie": {
"api_key": {
"type": "env",
"key": "PIXIE_API_KEY"
}
}
},
"configuration": {
"pixie_cluster_id": {
"type": "env",
"key": "PIXIE_CLUSTER_ID"
}
},
"steady-state-hypothesis": {
"title": "Run a Pixie script and evaluate it",
"probes": [
{
"type": "probe",
"name": "p99-latency-of-consumer-service-for-past-2m-remained-under-300ms",
"tolerance": {
"type": "probe",
"name": "compute-median",
"provider": {
"type": "python",
"module": "chaospixie.tolerances",
"func": "percentile_should_be_below",
"secrets": ["pixie"],
"arguments": {
"column": "latency_p99",
"percentile": 99,
"convert_from_nanoseconds": "milliseconds",
"treshold": 300.0
}
}
},
"provider": {
"type": "python",
"module": "chaospixie.probes",
"func": "run_script_from_local_file",
"secrets": ["pixie"],
"arguments": {
"script_path": "./pixiescript.py"
}
}
}
]
},
"method": [
{
"type": "action",
"name": "send-10-requests-per-second-for-60s",
"provider": {
"type": "process",
"path": "ddosify",
"arguments": "-d 60 -n 600 -o stdout-json -t http://mydomain.com/consumer"
}
}
]
}
This assumes you have a a service named consumer
. Pixie monitors its latency and produces percentiles for it. We then use a probe tolerance to evaluate the returned latency for the past 2 minutes and we measure if the latency was mainly (99-percentile) under 300ms.
In this example, we use ddosify to induce the load, but you can use your favourite tooling of course.
The Pixie script we run is as follows:
import px
ns_per_ms = 1000 * 1000
ns_per_s = 1000 * ns_per_ms
window_ns = px.DurationNanos(10 * ns_per_s)
filter_unresolved_inbound = True
filter_health_checks = True
filter_ready_checks = True
def inbound_let_timeseries(start_time: str, service: px.Service):
''' Compute the let as a timeseries for requests received by `service`.
Args:
@start_time: The timestamp of data to start at.
@service: The name of the service to filter on.
'''
df = let_helper(start_time)
df = df[px.has_service_name(df.service, service)]
df = df.groupby(['timestamp']).agg(
latency_quantiles=('latency', px.quantiles),
error_rate_per_window=('failure', px.mean),
throughput_total=('latency', px.count),
bytes_total=('resp_body_size', px.sum)
)
# Format the result of LET aggregates into proper scalar formats and
# time series.
df.latency_p50 = px.DurationNanos(px.floor(px.pluck_float64(df.latency_quantiles, 'p50')))
df.latency_p90 = px.DurationNanos(px.floor(px.pluck_float64(df.latency_quantiles, 'p90')))
df.latency_p99 = px.DurationNanos(px.floor(px.pluck_float64(df.latency_quantiles, 'p99')))
df.request_throughput = df.throughput_total / window_ns
df.errors_per_ns = df.error_rate_per_window * df.request_throughput / px.DurationNanos(1)
df.error_rate = px.Percent(df.error_rate_per_window)
df.bytes_per_ns = df.bytes_total / window_ns
df.time_ = df.timestamp
return df[['time_', 'latency_p50', 'latency_p90', 'latency_p99',
'request_throughput', 'errors_per_ns', 'error_rate', 'bytes_per_ns']]
def let_helper(start_time: str):
''' Compute the initial part of the let for requests.
Filtering to inbound/outbound traffic by service is done by the calling function.
Args:
@start_time: The timestamp of data to start at.
'''
df = px.DataFrame(table='http_events', start_time=start_time)
# Filter only to inbound service traffic (server-side).
# Don't include traffic initiated by this service to an external location.
df = df[df.trace_role == 2]
df.service = df.ctx['service']
df.pod = df.ctx['pod']
df.latency = df.latency
df.timestamp = px.bin(df.time_, window_ns)
df.failure = df.resp_status >= 400
filter_out_conds = ((df.req_path != '/healthz' or not filter_health_checks) and (
df.req_path != '/readyz' or not filter_ready_checks)) and (
df['remote_addr'] != '-' or not filter_unresolved_inbound)
df = df[filter_out_conds]
return df
df = inbound_let_timeseries("-2m", "default/consumer")
px.display(df)
This is an abridged script from Pixie itself.
That’s it!
Configuration¶
Test¶
To run the tests for the project execute the following:
$ pytest
Formatting and Linting¶
We use a combination of black
, flake8
, and isort
to both lint and format this repositories code.
Before raising a Pull Request, we recommend you run formatting against your code with:
$ make format
This will automatically format any code that doesn’t adhere to the formatting standards.
As some things are not picked up by the formatting, we also recommend you run:
$ make lint
To ensure that any unused import statements/strings that are too long, etc. are also picked up.
Contribute¶
If you wish to contribute more functions to this package, you are more than welcome to do so. Please, fork this project, make your changes following the usual PEP 8 code style, sprinkling with tests and submit a PR for review.
Exported Activities¶
probes¶
run_script
¶
Type | probe |
Module | chaospixie.probes |
Name | run_script |
Return | string |
Run a Pixie script.
Make sure to provide the name of the table you want to fetch data for. Usually it’s the name given to the px.display()
function in your script.
Signature:
def run_script(script: str,
table_name: str = 'output',
configuration: Dict[str, Dict[str, str]] = None,
secrets: Dict[str, Dict[str, str]] = None) -> str:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
script | string | Yes | |
table_name | string | “output” | No |
Usage:
{
"name": "run-script",
"type": "probe",
"provider": {
"type": "python",
"module": "chaospixie.probes",
"func": "run_script",
"arguments": {
"script": ""
}
}
}
name: run-script
provider:
arguments:
script: ''
func: run_script
module: chaospixie.probes
type: python
type: probe
run_script_from_local_file
¶
Type | probe |
Module | chaospixie.probes |
Name | run_script_from_local_file |
Return | list |
Run a Pixie script loaded from a local file.
Make sure to provide the name of the table you want to fetch data for. Usually it’s the name given to the px.display()
function in your script.
Signature:
def run_script_from_local_file(
script_path: str,
table_name: str = 'output',
configuration: Dict[str, Dict[str, str]] = None,
secrets: Dict[str, Dict[str, str]] = None) -> List[Dict[str, Any]]:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
script_path | string | Yes | |
table_name | string | “output” | No |
Usage:
{
"name": "run-script-from-local-file",
"type": "probe",
"provider": {
"type": "python",
"module": "chaospixie.probes",
"func": "run_script_from_local_file",
"arguments": {
"script_path": ""
}
}
}
name: run-script-from-local-file
provider:
arguments:
script_path: ''
func: run_script_from_local_file
module: chaospixie.probes
type: python
type: probe
tolerances¶
median_should_be_above
¶
Type | tolerance |
Module | chaospixie.tolerances |
Name | median_should_be_above |
Return | boolean |
Compute the median of all the column
in the list of results. If you need to limit the computation to a specific dataset within the results, you can provide the target
as a tuple such as (key, value)
. The value
can be a fixed value or a regular expression to match many.
Sometimes the column’s value type is in nanoseconds, which isn’t always easy to make sense of. You can set the convert_from_nanoseconds_to_seconds
flag so we automatically convert to seconds the value. In that case, the threshold mus also be in seconds.
Return true if the median is above (or equal) to the threshold you provide.
Signature:
def median_should_be_above(column: str,
treshold: float,
target: Tuple[str, str] = None,
convert_from_nanoseconds: Literal[
'seconds', 'milliseconds',
'microseconds'] = None,
value: List[Dict[str, Any]] = None) -> bool:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
column | string | Yes | |
treshold | number | Yes | |
target | object | null | No |
convert_from_nanoseconds | object | null | No |
value | list | null | No |
Tolerances declare the value
argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.
Usage:
{
"steady-state-hypothesis": {
"title": "...",
"probes": [
{
"type": "probe",
"tolerance": {
"name": "median-should-be-above",
"type": "tolerance",
"provider": {
"type": "python",
"module": "chaospixie.tolerances",
"func": "median_should_be_above",
"arguments": {
"column": "",
"treshold": null
}
}
},
"...": "..."
}
]
}
}
steady-state-hypothesis:
probes:
- '...': '...'
tolerance:
name: median-should-be-above
provider:
arguments:
column: ''
treshold: null
func: median_should_be_above
module: chaospixie.tolerances
type: python
type: tolerance
type: probe
title: '...'
median_should_be_below
¶
Type | tolerance |
Module | chaospixie.tolerances |
Name | median_should_be_below |
Return | boolean |
Compute the median of all the column
in the list of results. If you need to limit the computation to a specific dataset within the results, you can provide the target
as a tuple such as (key, value)
. The value
can be a fixed value or a regular expression to match many.
Sometimes the column’s value type is in nanoseconds, which isn’t always easy to make sense of. You can set the convert_from_nanoseconds
flag so we automatically convert to seconds the value. In that case, the threshold must also be in seconds. The convert_from_nanoseconds
flag can be: "seconds"
, "milliseconds"
or "microseconds"
.
Return true if the median is below (or equal) to the threshold you provide.
Signature:
def median_should_be_below(column: str,
treshold: float,
convert_from_nanoseconds: Literal[
'seconds', 'milliseconds',
'microseconds'] = None,
target: Tuple[str, str] = None,
value: List[Dict[str, Any]] = None) -> bool:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
column | string | Yes | |
treshold | number | Yes | |
convert_from_nanoseconds | object | null | No |
target | object | null | No |
value | list | null | No |
Tolerances declare the value
argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.
Usage:
{
"steady-state-hypothesis": {
"title": "...",
"probes": [
{
"type": "probe",
"tolerance": {
"name": "median-should-be-below",
"type": "tolerance",
"provider": {
"type": "python",
"module": "chaospixie.tolerances",
"func": "median_should_be_below",
"arguments": {
"column": "",
"treshold": null
}
}
},
"...": "..."
}
]
}
}
steady-state-hypothesis:
probes:
- '...': '...'
tolerance:
name: median-should-be-below
provider:
arguments:
column: ''
treshold: null
func: median_should_be_below
module: chaospixie.tolerances
type: python
type: tolerance
type: probe
title: '...'
percentile_should_be_above
¶
Type | tolerance |
Module | chaospixie.tolerances |
Name | percentile_should_be_above |
Return | boolean |
Compute the percentiles of all the column
in the list of results. The default returned percentile is the 99-percentile. If you need to limit the computation to a specific dataset within the results, you can provide the target
as a tuple such as (key, value)
. The value
can be a fixed value or a regular expression to match many.
Sometimes the column’s value type is in nanoseconds, which isn’t always easy to make sense of. You can set the convert_from_nanoseconds_to_seconds
flag so we automatically convert to seconds the value. In that case, the threshold mus also be in seconds.
Return true if the percentile is above (or equal) to the threshold you provide.
Signature:
def percentile_should_be_above(column: str,
treshold: float,
percentile: int = 99,
target: Tuple[str, str] = None,
convert_from_nanoseconds: Literal[
'seconds', 'milliseconds',
'microseconds'] = None,
value: List[Dict[str, Any]] = None) -> bool:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
column | string | Yes | |
treshold | number | Yes | |
percentile | integer | 99 | No |
target | object | null | No |
convert_from_nanoseconds | object | null | No |
value | list | null | No |
Tolerances declare the value
argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.
Usage:
{
"steady-state-hypothesis": {
"title": "...",
"probes": [
{
"type": "probe",
"tolerance": {
"name": "percentile-should-be-above",
"type": "tolerance",
"provider": {
"type": "python",
"module": "chaospixie.tolerances",
"func": "percentile_should_be_above",
"arguments": {
"column": "",
"treshold": null
}
}
},
"...": "..."
}
]
}
}
steady-state-hypothesis:
probes:
- '...': '...'
tolerance:
name: percentile-should-be-above
provider:
arguments:
column: ''
treshold: null
func: percentile_should_be_above
module: chaospixie.tolerances
type: python
type: tolerance
type: probe
title: '...'
percentile_should_be_below
¶
Type | tolerance |
Module | chaospixie.tolerances |
Name | percentile_should_be_below |
Return | boolean |
Compute the percentiles of all the column
in the list of results. The default returned percentile is the 99-percentile. If you need to limit the computation to a specific dataset within the results, you can provide the target
as a tuple such as (key, value)
. The value
can be a fixed value or a regular expression to match many.
Sometimes the column’s value type is in nanoseconds, which isn’t always easy to make sense of. You can set the convert_from_nanoseconds_to_seconds
flag so we automatically convert to seconds the value. In that case, the threshold mus also be in seconds.
Return true if the percentile is below (or equal) to the threshold you provide.
Signature:
def percentile_should_be_below(column: str,
treshold: float,
percentile: int = 99,
target: Tuple[str, str] = None,
convert_from_nanoseconds: Literal[
'seconds', 'milliseconds',
'microseconds'] = None,
value: List[Dict[str, Any]] = None) -> bool:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
column | string | Yes | |
treshold | number | Yes | |
percentile | integer | 99 | No |
target | object | null | No |
convert_from_nanoseconds | object | null | No |
value | list | null | No |
Tolerances declare the value
argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.
Usage:
{
"steady-state-hypothesis": {
"title": "...",
"probes": [
{
"type": "probe",
"tolerance": {
"name": "percentile-should-be-below",
"type": "tolerance",
"provider": {
"type": "python",
"module": "chaospixie.tolerances",
"func": "percentile_should_be_below",
"arguments": {
"column": "",
"treshold": null
}
}
},
"...": "..."
}
]
}
}
steady-state-hypothesis:
probes:
- '...': '...'
tolerance:
name: percentile-should-be-below
provider:
arguments:
column: ''
treshold: null
func: percentile_should_be_below
module: chaospixie.tolerances
type: python
type: tolerance
type: probe
title: '...'