This deep-dive article explores CloudWatch's custom metrics capabilities using AWS CLI v2. We will cover various use cases for emitting custom metrics data, aggregating these metrics using statistics such as Sum, Average, Min, Max, and SampleCount, creating alarms based on these aggregations, and visualizing the results for actionable insights.
In this comprehensive guide, we will explore real-world examples that illustrate end-to-end workflows with AWS CloudWatch. Starting with the steps to push custom metrics data via the command line, we will demonstrate how to retrieve aggregated statistics using diverse mathematical functions. We will also set up alarms based on these aggregations and query metrics for visualization. Each section contains practical code samples to facilitate the implementation and automation of these tasks using AWS CLI v2. These detailed examples not only showcase command syntax but also provide insights into how to effectively integrate custom metrics into your monitoring environment, allowing for proactive system oversight and efficient incident management.
Category 1: Emitting Custom Metrics Data
Overview: This section focuses on how to send custom metric data to CloudWatch using the AWS CLI. The examples will illustrate straightforward methods to push a single metric value as well as more complex, loop-based approaches to simulate sending multiple data points. Understanding these techniques establishes a solid foundation for integrating custom metrics into your monitoring system, ensuring accuracy for subsequent analyses using aggregated data.
Use Case 1.1 — Pushing Custom Metrics Data using AWS CLI
Description: In this use case, we will push a custom metric value representing CPU utilization to CloudWatch directly from a Bash script. Leveraging the AWS CLI command put-metric-data, we can send the metric accompanied by necessary metadata such as namespace, unit, and timestamp. This approach is particularly useful for capturing instantaneous performance metrics from your systems.
Sample 1 — Basic CPU Utilization:
#!/bin/bash
# Push a custom metric "CPUUtilization" to CloudWatch.
echo "Pushing custom metric: CPUUtilization"
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="CPUUtilization"
INSTANCE_ID="i-0123456789abcdef0"
UNIT="Percent"
VALUE=75
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
aws cloudwatch put-metric-data --metric-name $METRIC_NAME --namespace $NAMESPACE --value $VALUE --unit $UNIT --dimensions InstanceId=$INSTANCE_ID --timestamp $TIMESTAMP
echo "Metric data pushed successfully."Sample 2 — Loop for RequestCount:
#!/bin/bash
# Loop to send multiple custom metric values for RequestCount.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="RequestCount"
INSTANCE_ID="i-0123456789abcdef0"
UNIT="Count"
for VALUE in 10 20 30 40 50; do
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
echo "Sending metric value: $VALUE"
aws cloudwatch put-metric-data --metric-name $METRIC_NAME --namespace $NAMESPACE --value $VALUE --unit $UNIT --dimensions InstanceId=$INSTANCE_ID --timestamp $TIMESTAMP
sleep 1
done
echo "All metric data sent."Sample 3 — Random Memory Utilization:
#!/bin/bash
# Push a custom metric "MemoryUtilization" with a randomly generated value to CloudWatch.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="MemoryUtilization"
INSTANCE_ID="i-0abcdef1234567890"
UNIT="Percent"
# Generate a random percentage between 0 and 100.
VALUE=$(awk 'BEGIN { srand(); print int(rand()*100) }')
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
echo "Pushing custom metric: MemoryUtilization with value: $VALUE%"
aws cloudwatch put-metric-data --metric-name $METRIC_NAME --namespace $NAMESPACE --value $VALUE --unit $UNIT --dimensions InstanceId=$INSTANCE_ID --timestamp $TIMESTAMP
echo "Memory metric pushed successfully."Sample 4 — HTTP 4XX Error Rate:
#!/bin/bash
# Simulating the emission of HTTP 4XX error rates to CloudWatch.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="HTTP4XXErrorRate"
INSTANCE_ID="i-0abcdef1234567890"
UNIT="Count"
for i in {1..5}; do
VALUE=$(($RANDOM % 10)) # Random error count between 0 and 10
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
echo "Pushing HTTP 4XX error count: $VALUE"
aws cloudwatch put-metric-data --metric-name $METRIC_NAME --namespace $NAMESPACE --value $VALUE --unit $UNIT --dimensions InstanceId=$INSTANCE_ID --timestamp $TIMESTAMP
sleep 1
done
echo "All HTTP 4XX error data pushed."Category 2: Aggregating Custom Metrics by Different Statistics
Overview: In this section, we will retrieve aggregated data for custom metrics using various statistics. Each sub-use case shows how to use AWS CLI commands to compute Sum, Average, Min, Max, and SampleCount over defined time intervals. The examples will demonstrate how CloudWatch processes metric data and applies mathematical functions to derive useful performance insights, which enables more effective monitoring and troubleshooting.
Use Case 2.1 — Aggregating with Sum Statistic
Description: Here, the focus is on retrieving the total event counts by summing up the data points collected for a custom metric. We will use the AWS CLI command get-metric-statistics to query CloudWatch for metric data within a specific time interval, aggregating these values to compute the sum. This method is instrumental for monitoring cumulative events in your application, aiding in the detection of spikes in activity.
Sample: Sum of CustomEventCount:
#!/bin/bash
# Retrieve aggregated custom metric data using Sum statistic.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="CustomEventCount"
STATISTIC="Sum"
START_TIME=$(date -u -d '10 minutes ago' +"%Y-%m-%dT%H:%M:%SZ")
END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
aws cloudwatch get-metric-statistics --namespace $NAMESPACE --metric-name $METRIC_NAME --start-time $START_TIME --end-time $END_TIME --period 300 --statistics $STATISTIC
echo "Aggregated Sum metric retrieved for CustomEventCount."Sample 2 — Sum of CustomLoginAttempts:
#!/bin/bash
# Retrieve aggregated sum for CustomLoginAttempts.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="CustomLoginAttempts"
STATISTIC="Sum"
START_TIME=$(date -u -d '15 minutes ago' +"%Y-%m-%dT%H:%M:%SZ")
END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
aws cloudwatch get-metric-statistics --namespace $NAMESPACE --metric-name $METRIC_NAME --start-time $START_TIME --end-time $END_TIME --period 300 --statistics $STATISTIC
echo "Aggregated Sum metric for CustomLoginAttempts retrieved."Use Case 2.2 — Aggregating with Average Statistic
Description: In this use case, we will calculate the average value of a custom metric over a specified time period, enabling us to understand performance trends more effectively. The Average statistic can help identify whether the overall system performance complies with the defined Service Level Objectives (SLOs).
Sample: Average of CPUUtilization:
#!/bin/bash
# Retrieve average custom metric data for CPUUtilization.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="CPUUtilization"
STATISTIC="Average"
START_TIME=$(date -u -d '30 minutes ago' +"%Y-%m-%dT%H:%M:%SZ")
END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
aws cloudwatch get-metric-statistics --namespace $NAMESPACE --metric-name $METRIC_NAME --start-time $START_TIME --end-time $END_TIME --period 300 --statistics $STATISTIC
echo "Average metric retrieved for CPUUtilization."Use Case 2.3 — Aggregating with Min Statistic
Description: This use case focuses on calculating the minimum value recorded for a custom metric, which can be useful in identifying lower bounds of service performance or utilization metrics.
Sample: Minimum of MemoryUtilization:
#!/bin/bash
# Retrieve minimum custom metric data for MemoryUtilization.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="MemoryUtilization"
STATISTIC="Minimum"
START_TIME=$(date -u -d '1 hour ago' +"%Y-%m-%dT%H:%M:%SZ")
END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
aws cloudwatch get-metric-statistics --namespace $NAMESPACE --metric-name $METRIC_NAME --start-time $START_TIME --end-time $END_TIME --period 300 --statistics $STATISTIC
echo "Minimum metric retrieved for MemoryUtilization."Use Case 2.4 — Aggregating with Max Statistic
Description: Here we will retrieve the maximum value of a custom metric to detect peaks in resource utilization, which can help identify potential bottlenecks or over-utilization scenarios.
Sample: Maximum of HTTP4XXErrorRate:
#!/bin/bash
# Retrieve maximum custom metric data for HTTP4XXErrorRate.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="HTTP4XXErrorRate"
STATISTIC="Maximum"
START_TIME=$(date -u -d '2 hours ago' +"%Y-%m-%dT%H:%M:%SZ")
END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
aws cloudwatch get-metric-statistics --namespace $NAMESPACE --metric-name $METRIC_NAME --start-time $START_TIME --end-time $END_TIME --period 300 --statistics $STATISTIC
echo "Maximum metric retrieved for HTTP4XXErrorRate."Use Case 2.5 — Aggregating with SampleCount Statistic
Description: Finally, we will look at the SampleCount, which provides the total number of data points recorded for a particular metric. This statistic is crucial for understanding the volume of events being tracked over time.
Sample: SampleCount of CustomRequestCount:
#!/bin/bash
# Retrieve sample count for CustomRequestCount.
NAMESPACE="CustomMetricsDemo"
METRIC_NAME="CustomRequestCount"
STATISTIC="SampleCount"
START_TIME=$(date -u -d '5 minutes ago' +"%Y-%m-%dT%H:%M:%SZ")
END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
aws cloudwatch get-metric-statistics --namespace $NAMESPACE --metric-name $METRIC_NAME --start-time $START_TIME --end-time $END_TIME --period 60 --statistics $STATISTIC
echo "Sample count metric retrieved for CustomRequestCount."Setting Alarms Based on Aggregated Metrics
Overview: Having established how to emit and aggregate custom metrics, setting alarms on these metrics is the next logical step. Alarms can be configured to notify you when a metric breaches its defined thresholds, enabling timely responses to potential issues.
Use Case 3.1 — Creating an Alarm for CPU Utilization
Description: In this use case, we will create an alarm that triggers when CPU utilization exceeds 80% over a 5-minute period. This monitoring allows you to take proactive measures to avoid potential service degradation.
Sample: Create Alarm for CPUUtilization:
aws cloudwatch put-alarm --alarm-name "HighCPUUtilization" \
--metric-name "CPUUtilization" \
--namespace "CustomMetricsDemo" \
--statistic "Average" \
--period 300 \
--threshold 80 \
--comparison-operator "GreaterThanThreshold" \
--evaluation-periods 1 \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:NotifyMe" \
--dimensions "InstanceId=i-0123456789abcdef0" \
--unit "Percent"
echo "Alarm created for high CPU utilization."Use Case 3.2 — Creating an Alarm for Memory Utilization
Description: Here, we will configure an alarm that triggers if Memory Utilization exceeds a threshold of 75%. Keeping an eye on memory metrics can help you manage resources efficiently and maintain application performance.
Sample: Create Alarm for MemoryUtilization:
aws cloudwatch put-alarm --alarm-name "HighMemoryUtilization" \
--metric-name "MemoryUtilization" \
--namespace "CustomMetricsDemo" \
--statistic "Average" \
--period 300 \
--threshold 75 \
--comparison-operator "GreaterThanThreshold" \
--evaluation-periods 2 \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:NotifyMe" \
--dimensions "InstanceId=i-0123456789abcdef0" \
--unit "Percent"
echo "Alarm created for high memory utilization."Use Case 3.3 — Creating an Alarm for HTTP 4XX Error Rate
Description: This alarm will monitor the HTTP 4XX error rate, triggering if the rate surpasses 10 requests in a 5-minute timeframe. Monitoring these errors can help ensure a quality user experience by quickly addressing issues that prevent successful requests.
Sample: Create Alarm for HTTP4XXErrorRate:
aws cloudwatch put-alarm --alarm-name "HighHTTP4XXErrorRate" \
--metric-name "HTTP4XXErrorRate" \
--namespace "CustomMetricsDemo" \
--statistic "Sum" \
--period 300 \
--threshold 10 \
--comparison-operator "GreaterThanThreshold" \
--evaluation-periods 1 \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:NotifyMe" \
--dimensions "InstanceId=i-0abcdef1234567890" \
--unit "Count"
echo "Alarm created for high HTTP 4XX error rate."Visualization of Metrics
Overview: Visualizing metrics is crucial for effective monitoring, enabling stakeholders to quickly gauge system performance. CloudWatch provides dashboards that allow you to plot custom metrics, facilitating visual data analysis.
Creating a CloudWatch Dashboard
Sample: Create a Dashboard for Custom Metrics:
aws cloudwatch create-dashboard --dashboard-name "CustomMetricsDashboard" \
--dashboard-body '{
"widgets": [
{
"type": "metric",
"x": 0,
"y": 0,
"width": 12,
"height": 6,
"properties": {
"metrics": [
[ "CustomMetricsDemo", "CPUUtilization", "InstanceId", "i-0123456789abcdef0" ],
[ ".", "MemoryUtilization", "InstanceId", "i-0abcdef1234567890" ],
[ ".", "HTTP4XXErrorRate", "InstanceId", "i-0abcdef1234567890" ]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "Custom Metrics Overview"
}
}
]
}'
echo "Dashboard 'CustomMetricsDashboard' created."Conclusion
In this comprehensive article, we delved into handling CloudWatch custom metrics via AWS CLI v2. Starting from the basics of sending metric data with put-metric-data, to aggregating metrics using various statistics such as Sum, Average, Min, Max, and SampleCount, we've shown you practical steps to create alarms and visualize data effectively. By following these code snippets and guidelines, you establish a robust framework for real-time monitoring and proactive alerts.
The examples provided equip you with the tools needed to implement custom metric workflows in different application environments, ensuring reliable monitoring and alerting capabilities. By integrating these solutions into your infrastructure, you can continuously monitor key performance metrics, respond swiftly to anomalies, and maintain an optimal user experience. This article serves as a foundation for further customization and expansion in your AWS environment, enabling tailored alerting and monitoring solutions that capture the necessary insights and facilitate timely actions based on your application's behavior and performance trends.