Confluent Cloud Metrics API¶
The Confluent Cloud Metrics API provides actionable operational metrics about your Confluent Cloud deployment. This is a queryable HTTP API in which the user will POST a query written in JSON and get back a time series of metrics specified by the query.
This page is meant to be instructional and to help you get started with the Confluent Cloud Metrics API. For more information, see the API Reference.
Metrics API Quick Start¶
- Prerequisites
- Access to Confluent Cloud
- Internet connectivity
Note
The Confluent Cloud RBAC MetricsViewer role provides service account access to the Metrics API for all clusters in an organization. This role also enables service accounts to import metrics into third-party metrics platforms. For details, refer to Add the MetricsViewer role to a new service account in the UI below.
The following examples use HTTPie rather than cURL. This software package can be installed using most common software package managers by following the documentation .
Create a Cloud API key to authenticate to the Metrics API. For example:
ccloud login
ccloud api-key create --resource cloud
Note
You must use a Cloud API Key to communicate with the Metrics API. Using the Cluster API Key that is used to communicate with Kafka will result in an authentication error.
See also
For an example that showcases how to monitor an Apache Kafka® client application and Confluent Cloud metrics, and steps through various failure scenarios to show metrics results, see the Observability for Apache Kafka® Clients to Confluent Cloud demo.
Add the MetricsViewer role to a new service account in the UI¶
The MetricsViewer role provides service account access to the Metrics API for all clusters in an organization. This role also enables service accounts to import metrics into third-party metrics platforms.
To assign the MetricsViewer role to a new service account:
- In the top-right hamburger menu (☰) in the upper-right corner of the Confluent Cloud user interface, click ADMINISTRATION > Cloud API keys.
- Click Add key.
- Click the Granular access tile to set the scope for the API key. Click Next.
- Click Create a new one and specify the service account name, and optionally, a description. Click Next.
- The API key and secret are generated for the service account. You will need this API key and secret to connect to the cluster, so be sure to safely store this information. Click Save. The new service account with the API key and associated ACLs is created. When you return to the API access tab, you can view the newly-created API key to confirm.
- Return to Accounts & access in the upper-right hamburger menu and click the Service accounts tab under Accounts to view all the service accounts. Select the service account to which you want to add the MetricsViewer role and click the Access tab.
- Click Add role assignment and select the MetricsViewer tile. Click Save.
When you return to Accounts & access, you can view the resources for the organization, and also see that the service account you created has the MetricsViewer role binding.
Add the MetricsViewer role to a service account using the CLI¶
To add a role binding for MetricsViewer to a new service account:
ccloud login
# Create the service account
ccloud service-account create MetricsImporter --description "A service account to import Confluent Cloud metrics into our monitoring system"
+-------------+--------------------------------+
| Id | 447311 |
| Resource ID | sa-zm6vgz |
| Name | MetricsImporter |
| Description | A service account to import |
| | Confluent Cloud metrics into |
| | our monitoring system |
+-------------+--------------------------------+
# Make note of the Id and Resource ID fields.
# Create an API key and add it to the new service account
ccloud api-key create --resource cloud --service-account 447311
# Add the MetricsViewer role binding to the service account
ccloud iam rolebinding create --role MetricsViewer --principal User:sa-zm6vgz
To add a role binding for MetricsViewer to an existing service account:
ccloud login
# List service account IDs to identify the correct ID to use
ccloud service-account list
Id | Resource ID | Name | Description
+--------+-------------+------------------------------------+------------------------------------+
441804 | sa-mvz5y7 | sa_org_admin | Service Account with Org Admin
441806 | sa-w72yx5 | sa_metrics_viewer |
# Create the API key and add it to an existing service account
ccloud api-key create --resource cloud --service-account 441806
# Add the MetricsViewer role binding to the service account
ccloud iam rolebinding create --role MetricsViewer --principal User:sa-w72yx5
Discovery¶
The Metrics API provides endpoints for programmatic discovery of available resources and their metrics. This resource and metric metadata is represented by descriptor
objects.
The discovery endpoints can be used to avoid hardcoding metric and resource names into client scripts.
Discover available resources¶
A resource represents the entity against which metrics are collected, for example, a Kafka cluster, a Kafka Connector, a ksqlDB application, etc.
Get a description of the available resources by sending a GET
request to the descriptors/resources
endpoint of the API:
http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/descriptors/resources' --auth '<API_KEY>:<SECRET>'
This returns a JSON document describing the available resources to query and their labels.
Discover available metrics¶
Get a description of the available metrics by sending a GET
request to the descriptors/metrics
endpoint of the API:
http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/descriptors/metrics?resource_type=kafka' --auth '<API_KEY>:<SECRET>'
Note
The resource_type
query parameter is required to specify the type of resource for which to list metrics. The valid resource types can be determined using the /descriptors/resources
endpoint.
This returns a JSON document describing the available metrics to query and their labels.
A human-readable list of the current metrics is available in the API Reference.
Query for bytes sent to consumers per minute grouped by topic¶
Create a file named
sent_bytes_query.json
using the following template. Be sure to changelkc-XXXXX
and the timestamp values to match your needs:{ "aggregations": [ { "metric": "io.confluent.kafka.server/sent_bytes" } ], "filter": { "field": "resource.kafka.id", "op": "EQ", "value": "lkc-XXXXX" }, "granularity": "PT1M", "group_by": [ "metric.topic" ], "intervals": [ "2019-12-19T11:00:00-05:00/2019-12-19T11:05:00-05:00" ], "limit": 25 }
Submit the query as a
POST
using the following command. Be sure to changeAPI_KEY
andSECRET
to match your environments.http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' --auth '<API_KEY>:<SECRET>' < sent_bytes_query.json
Your output should resemble:
Note
Be aware that if you have not produced data during the time window, the dataset will be empty for a given topic.
{ "data": [ { "timestamp": "2019-12-19T16:01:00Z", "metric.topic": "test-topic", "value": 0.0 }, { "timestamp": "2019-12-19T16:02:00Z", "metric.topic": "test-topic", "value": 157.0 }, { "timestamp": "2019-12-19T16:03:00Z", "metric.topic": "test-topic", "value": 371.0 }, { "timestamp": "2019-12-19T16:04:00Z", "metric.topic": "test-topic", "value": 0.0 } ] }
Query for bytes received from producers per minute grouped by topic¶
Create a file named
received_bytes_query.json
using the following template. Be sure to changelkc-XXXXX
and the timestamp values to match your needs:{ "aggregations": [ { "metric": "io.confluent.kafka.server/received_bytes" } ], "filter": { "field": "resource.kafka.id", "op": "EQ", "value": "lkc-XXXXX" }, "granularity": "PT1M", "group_by": [ "metric.topic" ], "intervals": [ "2019-12-19T11:00:00-05:00/2019-12-19T11:05:00-05:00" ], "limit": 25 }
Submit the query as a
POST
using the following command. Be sure to changeAPI_KEY
andSECRET
to match your environments.http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' --auth '<API_KEY>:<SECRET>' < received_bytes_query.json
Your output should resemble:
Note
Be aware that if you have not produced data during the time window, the dataset will be empty for a given topic.
{ "data": [ { "timestamp": "2019-12-19T16:00:00Z", "metric.topic": "test-topic", "value": 72.0 }, { "timestamp": "2019-12-19T16:01:00Z", "metric.topic": "test-topic", "value": 139.0 }, { "timestamp": "2019-12-19T16:02:00Z", "metric.topic": "test-topic", "value": 232.0 }, { "timestamp": "2019-12-19T16:03:00Z", "metric.topic": "test-topic", "value": 0.0 }, { "timestamp": "2019-12-19T16:04:00Z", "metric.topic": "test-topic", "value": 0.0 } ] }
Query for max retained bytes per hour over 2 hours for topic named test-topic
¶
Create a file named
retained_bytes_query.json
using the following template. Changelkc-XXXXX
and the timestamp values to match your needs:{ "aggregations": [ { "metric": "io.confluent.kafka.server/retained_bytes" } ], "filter": { "op": "AND", "filters": [ { "field": "metric.topic", "op": "EQ", "value": "test-topic" }, { "field": "resource.kafka.id", "op": "EQ", "value": "lkc-XXXXX" } ] }, "granularity": "PT1M", "group_by": [ "metric.topic" ], "intervals": [ "2019-12-19T11:00:00-05:00/P0Y0M0DT2H0M0S" ], "limit": 25 }
Submit the query as a
POST
using the following command. Be sure to changeAPI_KEY
andSECRET
to match your environments.http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' --auth '<API_KEY>:<SECRET>' < retained_bytes_query.json
Your output should resemble:
{ "data": [ { "timestamp": "2019-12-19T16:00:00Z", "metric.topic": "test-topic", "value": 406561.0 }, { "timestamp": "2019-12-19T17:00:00Z", "metric.topic": "test-topic", "value": 406561.0 } ] }
Query for max retained bytes per hour over 2 hours for a cluster lkc-XXXXX
¶
Create a file named
cluster_retained_bytes_query.json
using the following template. Be sure to changelkc-XXXXX
and the timestamp values to match your needs:{ "aggregations": [ { "metric": "io.confluent.kafka.server/retained_bytes" } ], "filter": { "field": "resource.kafka.id", "op": "EQ", "value": "lkc-XXXXX" }, "granularity": "PT1H", "intervals": [ "2019-12-19T11:00:00-05:00/P0Y0M0DT2H0M0S" ], "limit": 5 }
Submit the query as a
POST
using the following command. Be sure to changeAPI_KEY
andSECRET
to match your environments.http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' --auth '<API_KEY>:<SECRET>' < cluster_retained_bytes_query.json
Your output should resemble:
{ "data": [ { "timestamp": "2019-12-19T16:00:00Z", "value": 507350.0 }, { "timestamp": "2019-12-19T17:00:00Z", "value": 507350.0 } ] }
Query for the number of streaming units used per hour for ksqlDB app lksqlc-XXXXX
¶
Create a file named
ksql_streaming_unit_count.json
using the following template. Be sure to changelksqlc-XXXXX
and the timestamp values to match your needs:{ "aggregations": [ { "metric": "io.confluent.kafka.ksql/streaming_unit_count" } ], "filter": { "field": "resource.ksql.id", "op": "EQ", "value": "lksqlc-XXXXX" }, "granularity": "PT1H", "intervals": [ "2021-02-24T10:00:00Z/2021-02-24T11:00:00Z" ], "group_by": [ "resource.ksql.id" ] }
Submit the query as a
POST
using the following command. Be sure to changeAPI_KEY
andSECRET
to match your environments.http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' --auth '<API_KEY>:<SECRET>' < ksql_streaming_unit_count.json
Your output should resemble:
{ "data": [ { "resource.ksql.id": "lksqlc-XXXXX", "timestamp": "2021-02-24T10:00:00Z", "value": 4.0 } ] }
Query for the number of schemas in the Schema Registry cluster lsrc-XXXXX
¶
Create a file named
schema_count.json
using the following template. Be sure to changelsrc-XXXXX
and the timestamp values to match your needs:{ "aggregations": [ { "metric": "io.confluent.kafka.schema_registry/schema_count" } ], "filter": { "field": "resource.schema_registry.id", "op": "EQ", "value": "lsrc-XXXXX" }, "granularity": "PT1M", "intervals": [ "2021-02-24T10:00:00/2021-02-24T10:01:00" ], "group_by": [ "resource.schema_registry.id" ] }
Submit the query as a
POST
using the following command. Be sure to changeAPI_KEY
andSECRET
to match your environments.http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' --auth '<API_KEY>:<SECRET>' < schema_count.json
Your output should resemble:
{ "data": [ { "resource.schema_registry.id": "lsrc-XXXXX", "timestamp": "2021-02-24T10:00:00Z", "value": 1.0 } ] }
Query for the hourly number of records received by a sink Connector lcc-XXXXX
¶
Create a file named
sink_connector_records.json
using the following template. Be sure to changelcc-XXXXX
and the timestamp values to match your needs:{ "aggregations": [ { "metric": "io.confluent.kafka.connect/received_records" } ], "filter": { "field": "resource.connector.id", "op": "EQ", "value": "lcc-XXXXX" }, "granularity": "PT1H", "intervals": [ "2021-02-24T10:00:00/2021-02-24T11:00:00" ], "group_by": [ "resource.connector.id" ] }
Submit the query as a
POST
using the following command. Be sure to changeAPI_KEY
andSECRET
to match your environments.http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' --auth '<API_KEY>:<SECRET>' < sink_connector_records.json
Your output should resemble:
{ "data": [ { "resource.connector.id": "lcc-XXXXX", "timestamp": "2021-02-24T10:00:00Z", "value": 26455991.0 } ] }
List the topic labels for a given metric in a specified interval¶
The attributes
endpoint can be used to enumerate the distinct label values for a given metric.
Create a file named
attributes_query.json
using the following template. Be sure to changelkc-XXXXX
and the timestamp values to match your needs:{ "filter": { "field": "resource.kafka.id", "op": "EQ", "value": "lkc-XXXXX" }, "group_by": [ "metric.topic" ], "intervals": [ "2020-01-13T10:30:00-05:00/2020-01-13T11:00:00-05:00" ], "limit": 25, "metric": "io.confluent.kafka.server/sent_bytes" }
Submit the query as a
POST
using the following command. Be sure to changeAPI_KEY
andSECRET
to match your environments.http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/attributes' --auth '<API_KEY>:<SECRET>' < attributes_query.json
Your output should resemble:
Note
Be aware that topics without any reported metric values during the specified interval will not be returned.
{ "data": [ { "metric.topic": "test-topic" } ], "meta": { "pagination": { "page_size": 25 } } }
FAQ¶
Can the Metrics API be used to reconcile my bill?¶
No, the Metrics API is intended to provide information for the purposes of monitoring, troubleshooting, and capacity planning. It is not intended as an audit system for reconciling bills as the metrics do not include request overhead for the Kafka protocol at this time. For more details, see the billing documentation.
Why am I seeing empty data sets for topics that exist on queries other than for retained_bytes
?¶
If there are only values of 0.0 in the time range queried, than the API will return an empty set. When there is non-zero data within the time range, time slices with values of 0.0 are returned.
Why didn’t retained_bytes
decrease after I changed the retention policy for my topic?¶
The value of retained_bytes
is calculated as the maximum over the interval for each data point returned.
If data has been deleted during the current interval, you will not see the effect until the next time range window begins.
For example, if you produced 4GB of data per day over the last 30 days and queried for retained_bytes
over the last 3 days with a 1 day interval, the query would return values of 112GB, 116GB, 120GB as a time series. If you then deleted all data in the topic and stopped producing data, the query would return the same values until the next day. When queried at the start of the next day, the same query would return 116GB, 120GB, 0GB.
What are the supported granularity levels?¶
Data is stored at a granularity of one minute. However, the allowed granularity for a query is restricted by the size of the query’s interval. Please see the API Reference for the currently supported granularity levels and query restrictions.
Why don’t I see consumer lag in the Metrics API?¶
In Kafka, consumer lag is not tracked as a metric on the server side. This is because it is a cluster-level construct and today, Kafka’s metrics are derived from instrumentation at a lower level abstraction. Consumer lag may be added to the Metrics API at a later date. At this time, there are multiple other ways to monitor Consumer lag including the client metrics, UI, CLI, and Admin API. These methods are all available when using Confluent Cloud.
What is the retention time of metrics in the Metrics API?¶
Metrics are retained for seven days.
How do I know if a given metric is in preview or generally available (GA)?¶
We are always looking to add new metrics, but when we add a new metric, we
need to take some time to stabilize how we expose it, to ensure that it’s suitable
for most use cases. Each metric’s lifecycle stage (preview
, generally available
, etc.) is included in the
response from the /descriptors/metrics
endpoint. While a metric is in preview
we may
make breaking changes to its labels without an API version change, as we iterate
to provide the best possible experience.
What should I do if a query to Metrics API returns a timeout response (HTTP error code 504)?¶
If queries are exceeding the timeout (maximum query time is 60s) you may consider one or more of the following approaches:
- Reduce the time interval.
- Reduce the granularity of data returned.
- Break up the query on the client side to return fewer data points. For example, you can query for specific topics instead of all topics at once.
These approaches are especially important to when querying for partition-level data over days-long intervals.
Why are my Confluent Cloud metrics displaying only 1hr/6hrs/24hrs worth of data?¶
This is a known limitation that occurs in some clusters with a partition count of more than 2,000. We are working on the issue, but there is no fix at this time.
What should I do if a query returns a 5xx response code?¶
We recommended retrying these type of responses. Usually, this is an indication of a transient server-side issue. You should design your client implementations for querying the Metrics API to be resilient to this type of response for minutes-long periods.
Suggested Resources¶
- Podcast: Adopting OpenTelemetry in Confluent and Beyond ft. Xavier Léauté
- Podcast: Multi-Cloud Monitoring and Observability with the Metrics API ft. Dustin Cote
- To learn how to architect, monitor, and optimize your Kafka applications on Confluent Cloud, refer to Developing Client Applications on Confluent Cloud.