autoscaler

Autoscaler tool for Cloud Spanner

Retrieve metrics for one or more Cloud Spanner Instances
Home · Poller function · Scaler function · Forwarder function · Terraform configuration

Table of Contents
Overview
Configuration parameters
- Required
- Optional
Metrics parameters
- Selectors
- Parameters
Custom metrics, thresholds and margins
Example configuration

Overview

The Poller function takes an array of Cloud Spanner instances from the payload of a Cloud PubSub message and obtains load metrics for each of them from Cloud Monitoring.

Then for each Spanner instance it publishes a message to the specified Cloud PubSub topic including the metrics and part of the configuration for the Spanner instance.

The Scaler function will receive the message, compare the metric values with the recommended thresholds, plus or minus an allowed margin, and if any of the values fall outside of this range, the Scaler function will adjust the number of nodes in the Spanner instance accordingly. Note that the thresholds are different depending if a Spanner instance is regional or multi-region.

Configuration parameters

The following are the configuration parameters consumed by the Poller function. Some of these parameters are forwarded to the Scaler function as well.

The parameters are defined using JSON in the payload of the PubSub message that is published by the Cloud Scheduler job. See the configuration section in the home page for instructions on how to change the payload.

Required

Key	Description
`projectId`	Project ID of the Cloud Spanner to be monitored by the Autoscaler
`instanceId`	Instance ID of the Cloud Spanner to be monitored by the Autoscaler
`scalerPubSubTopic`	PubSub topic for the Poller function to publish messages for the Scaler function

Optional

Key	Default Value	Description
`minNodes`	1	Minimum number of Cloud Spanner nodes that the instance can be scaled IN to.
`maxNodes`	3	Maximum number of Cloud Spanner nodes that the instance can be scaled OUT to.
`scalingMethod`	`STEPWISE`	Scaling method that should be used. Options are: `STEPWISE`, `LINEAR`, `DIRECT`. See the scaling methods section in the Scaler function page for more information.
`stepSize`	2	Number of nodes that should be added or removed when scaling with the `STEPWISE` method.
`overloadStepSize`	5	Number of nodes that should be added when the Cloud Spanner instance is overloaded, and the `STEPWISE` method is used.
`scaleOutCoolingMinutes`	5	Minutes to wait after scaling IN or OUT before a scale OUT event can be processed.
`scaleInCoolingMinutes`	30	Minutes to wait after scaling IN or OUT before a scale IN event can be processed.
`overloadCoolingMinutes`	5	Minutes to wait after scaling IN or OUT before a scale OUT event can be processed, when the Spanner instance is overloaded. An instance is overloaded if its High Priority CPU utilization is over 90%.
`stateProjectId`	`${projectId}`	The project ID where the Autoscaler state will be persisted. By default it is persisted using Cloud Firestore in the same project as the Spanner instance.
`metrics`	Array	Array of objects that can override the values in the metrics used to decide when the Cloud Spanner instance should be scaled IN or OUT. Refer to the metrics definition table to see the fields used for defining metrics.

Metrics parameters

The table describes the objects used to define metrics. These can be provided in the configuration objects to customize the metrics used to autoscale your Cloud Spanner instances.

To specify a custom threshold specify the name of the metrics to customize followed by the parameter values you wish to change. The updated parameters will be merged with the default metric parameters.

Selectors

Key	Description
`name`	A unique name of the for the metric to be evaulated. If you want to override the default metrics, their names are: `high_priority_cpu`, `rolling_24_hr` and `storage`.

Parameters

When defining a metric for the Autoscaler there are two key components: thresholds and a Cloud Monitoring time series metric comprised of a filter, reducer, aligner and period. Having a properly defined metric is critical to the opertional of the Autoscaler, please refer to Filtering and aggregation: manipulating time series for a complete discussion on building metric filters and aggregating data points.

Key	Default	Description
`filter`		The Cloud Spanner metric and filter that should be used when querying for data. The Autoscaler will automatically add the filter expressions for Spanner instance resources, instance id and project id.
`reducer`	`REDUCE_SUM`	The reducer specifies how the data points should be aggregated when querying for metrics, typically `REDUCE_SUM`. For more details please refer to Alert Policies - Reducer documentation.
`aligner`	`ALIGN_MAX`	The aligner specifies how the data points should be aligned in the time series, typically `ALIGN_MAX`. For more details please refer to Alert Policies - Aligner documentation.
`period`	60	Defines the period of time in units of seconds at which aggregation takes place. Typically the period should be 60.
`regional_threshold`		Threshold used to evaluate if a regional instance needs to be scaled in or out.
`multi_regional_threshold`		Threshold used to evaluate if a multi-regional instance needs to be scaled in or out.
`regional_margin`	5	Margin above and below the threshold where the metric value is allowed. If the metric falls outside of the range `[threshold - margin, threshold + margin]`, then the regional instance needs to be scaled in or out.
`multi_regional_margin`	5	Margin above and below the threshold where the metric value is allowed. If the metric falls outside of the range `[threshold - margin, threshold + margin]`, then the multi regional instance needs to be scaled in or out.

Custom metrics, thresholds and margins

The Autoscaler determines the number of nodes to be added or substracted to an instance based on the Spanner recommended thresholds for High Priority CPU, 24 hour rolling average CPU and Storage utilization metrics.

Google recommends using the provided metrics, thresholds and margins unchanged. However, in some cases you may want to modify these or use a custom metric, for example: if reaching the default upper limit triggers an alert to your operations team, you could make the Autoscaler react to a more conservative threshold to avoid alerts being triggered.

Thresholds

To modify the recommended thresholds, add the metrics parameter to your configuration and specify name (high_priority_cpu, rolling_24_hr and storage) of the metric to be changed and desired regional_threshold or multi_regional_threshold for your Cloud Spanner instance.

Margins

A margin defines an upper and a lower limit around the threshold. An autoscaling event will be triggered only if the metric value falls above the upper limit, or below the lower limit.

The objective of this parameter is to avoid autoscaling events being triggered for small workload fluctuations around the threshold, thus creating a smoothing effect in autoscaler actions. The threshold and metric together define a range [threshold - margin, threshold + margin], where the metric value is allowed. The smaller the margin, the narrower the range, resulting in higher probability that an autoscaling event is triggered.

By default, the margin value is 5 for both regional and multi-regional instances. You can change the default value by specifying regional_margin or multi_regional_margin in the metric parameters. Specifying a margin parameter for a metric is optional.

Metrics

To create a custom metric, add the metrics parameter to your configuration specifying the required fields (name, filter, regional_threshold, multi_regional_threshold). The period, reducer and aligner are defaulted but can also be specified in the metric definition.

Cloud Spanner metric and filter that should be used when querying for data. The Autoscaler will automatically add the filter expressions for Spanner instance resources, instance id and project id.

Example configuration

[
    {
        "projectId": "basic-configuration",
        "instanceId": "another-spanner1",
        "scalerPubSubTopic": "projects/my-spanner-project/topics/spanner-scaling",
        "minNodes": 5,
        "maxNodes": 30,
        "scalingMethod": "DIRECT"
    },{
        "projectId": "custom-threshold",
        "instanceId": "spanner1",
        "scalerPubSubTopic": "projects/my-spanner-project/topics/spanner-scaling",
        "minNodes": 1,
        "maxNodes": 3,
        "metrics": [
          {
            "name": "high_priority_cpu",
            "regional_threshold": 40,
            "regional_margin": 3
          }
        ]
    },{
        "projectId": "custom-metric",
        "instanceId": "another-spanner1",
        "scalerPubSubTopic": "projects/my-spanner-project/topics/spanner-scaling",
        "minNodes": 5,
        "maxNodes": 30,
        "scalingMethod": "LINEAR",
        "metrics": [
          {
            "name": "my_custom_metric",
            "fitler": "metric.type=\"spanner.googleapis.com/instance/resource/metric\"",
            "regional_threshold": 40,
            "multi_regional_threshold": 30
          }
        ]
    }
]