-
Couldn't load subscription status.
- Fork 48
Description
See this issue described in StackOverflow: https://stackoverflow.com/questions/79783245/google-cloud-run-error-with-opentelemetry-cloudmonitoringmetricsexporter-one-o
Background
I have a containerized Python Flask application that is deployed on Google Cloud Run. I want to extract custom metrics from this app and send them to Google Cloud Monitoring.
I followed the example in these two websites, using CloudMonitoringMetricsExporter from opentelemetry.exporter.cloud_monitoring to export metrics directly to Google Cloud Monitoring (without using a collector sidecar as described here):
- https://pypi.org/project/opentelemetry-exporter-gcp-monitoring/
- https://google-cloud-opentelemetry.readthedocs.io/en/latest/examples/cloud_monitoring/README.html
Error
Sometimes, but not always, almost exactly 15 minutes after my Cloud Run service records the last activity in the logs, I see the following in the logs, showing a termination signal from Cloud Run, following by an error writing to Google Cloud Monitoring:
[2025-10-05 13:03:54 +0000] [1] [INFO] Handling signal: term
[2025-10-05 13:03:54 +0000] [2] [INFO] Worker exiting (pid: 2)
[ERROR] - Error while writing to Cloud Monitoring
Traceback (most recent call last):
File "/usr/local/lib/python3.13/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
return callable_(*args, **kwargs)
File "/usr/local/lib/python3.13/site-packages/grpc/_interceptor.py", line 277, in __call__
response, ignored_call = self._with_call(
request,
...<4 lines>...
compression=compression,
)
File "/usr/local/lib/python3.13/site-packages/grpc/_interceptor.py", line 332, in _with_call
return call.result(), call
File "/usr/local/lib/python3.13/site-packages/grpc/_channel.py", line 440, in result
raise self
File "/usr/local/lib/python3.13/site-packages/grpc/_interceptor.py", line 315, in continuation
response, call = self._thunk(new_method).with_call(
request,
...<4 lines>...
compression=new_compression,
)
File "/usr/local/lib/python3.13/site-packages/grpc/_channel.py", line 1195, in with_call
return _end_unary_response_blocking(state, call, True, None)
File "/usr/local/lib/python3.13/site-packages/grpc/_channel.py", line 1009, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
The specific error is then "One or more points were written more frequently than the maximum sampling period configured for the metric" (same one as called out here):
" details = "One or more TimeSeries could not be written: timeSeries[0-2] (example metric.type="workload.googleapis.com/<redacted>", metric.labels={"net_peer_name": "<redacted>", "environment": "prod", "webhook_label": "generic", "component": "forwarder", "http_status_code": "200", "http_status_bucket": "2xx", "user_agent": "<redacted>", "opentelemetry_id": "d731413a"}): write for resource=generic_task{namespace:cloud-run,location:us-central1,job:<redacted>,task_id:02f24696-0786-4970-a93b-02176d5f1d75} failed with: One or more points were written more frequently than the maximum sampling period configured for the metric. {Metric: workload.googleapis.com/<redacted>, Timestamps: {Youngest Existing: '2025/10/05-06:03:53.004', New: '2025/10/05-06:03:54.778'}}""
The error log continues:
" debug_error_string = "UNKNOWN:Error received from peer ipv4:173.194.194.95:443 {grpc_message:"One or more TimeSeries could not be written: timeSeries[0-2] (example metric.type=\"workload.googleapis.com/<redacted>\", metric.labels={\"net_peer_name\": \"<redacted>\", \"environment\": \"prod\", \"webhook_label\": \"generic\", \"component\": \"forwarder\", \"http_status_code\": \"200\", \"http_status_bucket\": \"2xx\", \"user_agent\": \"<redacted>\", \"opentelemetry_id\": \"d731413a\"}): write for resource=generic_task{namespace:cloud-run,location:us-central1,job:<redacted>,task_id:02f24696-0786-4970-a93b-02176d5f1d75} failed with: One or more points were written more frequently than the maximum sampling period configured for the metric. {Metric: workload.googleapis.com/<redacted>, Timestamps: {Youngest Existing: \'2025/10/05-06:03:53.004\', New: \'2025/10/05-06:03:54.778\'}}", grpc_status:3}""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.13/site-packages/opentelemetry/exporter/cloud_monitoring/__init__.py", line 371, in export
self._batch_write(all_series)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/opentelemetry/exporter/cloud_monitoring/__init__.py", line 155, in _batch_write
self.client.create_time_series(
CreateTimeSeriesRequest(
...<4 lines>...
),
)
File "/usr/local/lib/python3.13/site-packages/google/cloud/monitoring_v3/services/metric_service/client.py", line 1791, in create_time_series
rpc(
request,
...<2 lines>...
metadata=metadata,
)
File "/usr/local/lib/python3.13/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.13/site-packages/google/api_core/timeout.py", line 130, in func_with_timeout
return func(*args, **kwargs)
File "/usr/local/lib/python3.13/site-packages/google/api_core/grpc_helpers.py", line 77, in error_remapped_callable
raise exceptions.from_grpc_error(exc) from exc
"google.api_core.exceptions.InvalidArgument: 400 One or more TimeSeries could not be written: timeSeries[0-2] (example metric.type="workload.googleapis.com/<redacted>", metric.labels={"net_peer_name": "<redacted>", "environment": "prod", "webhook_label": "generic", "component": "forwarder", "http_status_code": "200", "http_status_bucket": "2xx", "user_agent": "<redacted>", "opentelemetry_id": "d731413a"}): write for resource=generic_task{namespace:cloud-run,location:us-central1,job:<redacted>,task_id:02f24696-0786-4970-a93b-02176d5f1d75} failed with: One or more points were written more frequently than the maximum sampling period configured for the metric. {Metric: workload.googleapis.com/<redacted>, Timestamps: {Youngest Existing: '2025/10/05-06:03:53.004', New: '2025/10/05-06:03:54.778'}} [type_url: "type.googleapis.com/google.monitoring.v3.CreateTimeSeriesSummary""
value: "\010\003\032\006\n\002\010\t\020\003
]
[2025-10-05 13:03:57 +0000] [1] [INFO] Shutting down: Master
My Code
I have a function called configure_metrics, which is (simplified):
def configure_metrics(**kwargs):
"""
Configure OpenTelemetry metrics for Cloud Monitoring.
"""
name, namespace, version, instance_id = _infer_service_identity(
service_name, service_namespace, service_version, service_instance_id
) # Custom internal function
# Base resource with service-specific attributes; avoid platform-specific hardcoding here.
base_resource = Resource.create(
{
"service.name": name,
"service.namespace": namespace,
"service.version": version,
"service.instance.id": instance_id,
}
)
# Detect environment-specific resource (e.g., GCE VM, GKE Pod, Cloud Run instance) and merge.
try:
detected_resource = GoogleCloudResourceDetector().detect()
except Exception as e:
logger.debug(
"GCP resource detection failed; continuing with base resource: %s", e
)
detected_resource = Resource.create({})
resource = detected_resource.merge(base_resource)
exporter = CloudMonitoringMetricsExporter(
# Helps avoid 'written more frequently than the maximum sampling period' conflicts
add_unique_identifier=add_unique_identifier
)
reader = PeriodicExportingMetricReader(
exporter, export_interval_millis=export_interval_ms
)
provider = MeterProvider(metric_readers=[reader], resource=resource)
# Sets the global MeterProvider
# After this, any metrics.get_meter(<any_name>) in your process gets a Meter from this provider.
metrics.set_meter_provider(provider)
In main.py, I configure OpenTelemetry metrics as:
def create_app() -> Flask:
app = Flask(__name__)
# Initialize OTel metrics provider once per process/worker.
configure_metrics(
export_interval_ms=60000
) # Export every minute, instead of default every 5 seconds
# Only now import and register blueprints (routes) so instruments are created
# against the meter provider installed in configure_metrics()
from app.routes import webhook
app.register_blueprint(webhook.bp)
return app
app = create_app()
if __name__ == "__main__":
app.run(port=8080)
And in other files, such as webhook.py referenced above, I define my own custom metrics as in this example:
# ---------------------------------------------
# OpenTelemetry (OTel) metrics
# ---------------------------------------------
# Get a meter from the provider that was installed in main.py
meter = metrics.get_meter(
"webhooks"
) # Any stable string works for naming this meter
# Request counter
# Metric name maps to workload.googleapis.com/request_counter in Cloud Monitoring.
requests_counter = meter.create_counter(
name="webhook_request_counter",
description="Total number of HTTP requests processed by the webhooks blueprint",
unit="1",
)
And the metric is updated where needed as:
requests_counter.add(1, attributes=attrs)
Possible Explanation
I think something along these lines is happening:
- The exporter to Cloud Monitoring is running every 60 seconds.
- Suppose at time T a scheduled export occurs, sending new points for each time series.
- Then some time later, the container is being terminated (e.g. Cloud Run shutting down or scaling), and before exiting, the application invokes a shutdown handler or signal handler that triggers a flush of metrics (force flush).
- That flush occurs shortly (~1 or 2 seconds) after the last scheduled export. Some of the same time series get a new “point” during that flush with a timestamp that is only 1–2 seconds apart from the previous. Because that's < 5s, Cloud Monitoring rejects it.
Help
I do not know how to handle this event in the code in such a as to avoid the error but not result in data loss. What edits should I make? Or separately, is this an issue to be solved instead in the OpenTelemetry Python SDK for GCP?