Skip to content

Google Cloud Run error with OpenTelemetry CloudMonitoringMetricsExporter: "One or more points were written more frequently than the maximum sampl..." #431

@carlospatino-guardian

Description

@carlospatino-guardian

See this issue described in StackOverflow: https://stackoverflow.com/questions/79783245/google-cloud-run-error-with-opentelemetry-cloudmonitoringmetricsexporter-one-o

Background
I have a containerized Python Flask application that is deployed on Google Cloud Run. I want to extract custom metrics from this app and send them to Google Cloud Monitoring.

I followed the example in these two websites, using CloudMonitoringMetricsExporter from opentelemetry.exporter.cloud_monitoring to export metrics directly to Google Cloud Monitoring (without using a collector sidecar as described here):

Error
Sometimes, but not always, almost exactly 15 minutes after my Cloud Run service records the last activity in the logs, I see the following in the logs, showing a termination signal from Cloud Run, following by an error writing to Google Cloud Monitoring:

[2025-10-05 13:03:54 +0000] [1] [INFO] Handling signal: term
[2025-10-05 13:03:54 +0000] [2] [INFO] Worker exiting (pid: 2)
[ERROR] - Error while writing to Cloud Monitoring
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.13/site-packages/grpc/_interceptor.py", line 277, in __call__
    response, ignored_call = self._with_call(
        request,
    ...<4 lines>...
    compression=compression,
    )
  File "/usr/local/lib/python3.13/site-packages/grpc/_interceptor.py", line 332, in _with_call
    return call.result(), call
  File "/usr/local/lib/python3.13/site-packages/grpc/_channel.py", line 440, in result
    raise self
  File "/usr/local/lib/python3.13/site-packages/grpc/_interceptor.py", line 315, in continuation
    response, call = self._thunk(new_method).with_call(
        request,
    ...<4 lines>...
        compression=new_compression,
    )
  File "/usr/local/lib/python3.13/site-packages/grpc/_channel.py", line 1195, in with_call
    return _end_unary_response_blocking(state, call, True, None)
  File "/usr/local/lib/python3.13/site-packages/grpc/_channel.py", line 1009, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT

The specific error is then "One or more points were written more frequently than the maximum sampling period configured for the metric" (same one as called out here):

 "  details = "One or more TimeSeries could not be written: timeSeries[0-2] (example metric.type="workload.googleapis.com/<redacted>", metric.labels={"net_peer_name": "<redacted>", "environment": "prod", "webhook_label": "generic", "component": "forwarder", "http_status_code": "200", "http_status_bucket": "2xx", "user_agent": "<redacted>", "opentelemetry_id": "d731413a"}): write for resource=generic_task{namespace:cloud-run,location:us-central1,job:<redacted>,task_id:02f24696-0786-4970-a93b-02176d5f1d75} failed with: One or more points were written more frequently than the maximum sampling period configured for the metric. {Metric: workload.googleapis.com/<redacted>, Timestamps: {Youngest Existing: '2025/10/05-06:03:53.004', New: '2025/10/05-06:03:54.778'}}""

The error log continues:

"   debug_error_string = "UNKNOWN:Error received from peer ipv4:173.194.194.95:443 {grpc_message:"One or more TimeSeries could not be written: timeSeries[0-2] (example metric.type=\"workload.googleapis.com/<redacted>\", metric.labels={\"net_peer_name\": \"<redacted>\", \"environment\": \"prod\", \"webhook_label\": \"generic\", \"component\": \"forwarder\", \"http_status_code\": \"200\", \"http_status_bucket\": \"2xx\", \"user_agent\": \"<redacted>\", \"opentelemetry_id\": \"d731413a\"}): write for resource=generic_task{namespace:cloud-run,location:us-central1,job:<redacted>,task_id:02f24696-0786-4970-a93b-02176d5f1d75} failed with: One or more points were written more frequently than the maximum sampling period configured for the metric. {Metric: workload.googleapis.com/<redacted>, Timestamps: {Youngest Existing: \'2025/10/05-06:03:53.004\', New: \'2025/10/05-06:03:54.778\'}}", grpc_status:3}""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/opentelemetry/exporter/cloud_monitoring/__init__.py", line 371, in export
    self._batch_write(all_series)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/opentelemetry/exporter/cloud_monitoring/__init__.py", line 155, in _batch_write
    self.client.create_time_series(
        CreateTimeSeriesRequest(
    ...<4 lines>...
        ),
    )
  File "/usr/local/lib/python3.13/site-packages/google/cloud/monitoring_v3/services/metric_service/client.py", line 1791, in create_time_series
    rpc(
        request,
    ...<2 lines>...
        metadata=metadata,
    )
  File "/usr/local/lib/python3.13/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.13/site-packages/google/api_core/timeout.py", line 130, in func_with_timeout
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.13/site-packages/google/api_core/grpc_helpers.py", line 77, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
"google.api_core.exceptions.InvalidArgument: 400 One or more TimeSeries could not be written: timeSeries[0-2] (example metric.type="workload.googleapis.com/<redacted>", metric.labels={"net_peer_name": "<redacted>", "environment": "prod", "webhook_label": "generic", "component": "forwarder", "http_status_code": "200", "http_status_bucket": "2xx", "user_agent": "<redacted>", "opentelemetry_id": "d731413a"}): write for resource=generic_task{namespace:cloud-run,location:us-central1,job:<redacted>,task_id:02f24696-0786-4970-a93b-02176d5f1d75} failed with: One or more points were written more frequently than the maximum sampling period configured for the metric. {Metric: workload.googleapis.com/<redacted>, Timestamps: {Youngest Existing: '2025/10/05-06:03:53.004', New: '2025/10/05-06:03:54.778'}} [type_url: "type.googleapis.com/google.monitoring.v3.CreateTimeSeriesSummary""
value: "\010\003\032\006\n\002\010\t\020\003
]
[2025-10-05 13:03:57 +0000] [1] [INFO] Shutting down: Master

My Code
I have a function called configure_metrics, which is (simplified):

def configure_metrics(**kwargs):
    """
    Configure OpenTelemetry metrics for Cloud Monitoring.
    """    
    name, namespace, version, instance_id = _infer_service_identity(
        service_name, service_namespace, service_version, service_instance_id
    ) # Custom internal function

    # Base resource with service-specific attributes; avoid platform-specific hardcoding here.
    base_resource = Resource.create(
        {
            "service.name": name,
            "service.namespace": namespace,
            "service.version": version,
            "service.instance.id": instance_id,
        }
    )

    # Detect environment-specific resource (e.g., GCE VM, GKE Pod, Cloud Run instance) and merge.
    try:
        detected_resource = GoogleCloudResourceDetector().detect()
    except Exception as e:
        logger.debug(
            "GCP resource detection failed; continuing with base resource: %s", e
        )
        detected_resource = Resource.create({})

    resource = detected_resource.merge(base_resource)

    exporter = CloudMonitoringMetricsExporter(
        # Helps avoid 'written more frequently than the maximum sampling period' conflicts
        add_unique_identifier=add_unique_identifier
    )

    reader = PeriodicExportingMetricReader(
        exporter, export_interval_millis=export_interval_ms
    )
        provider = MeterProvider(metric_readers=[reader], resource=resource)

    # Sets the global MeterProvider
    # After this, any metrics.get_meter(<any_name>) in your process gets a Meter from this provider.
    metrics.set_meter_provider(provider)

In main.py, I configure OpenTelemetry metrics as:

def create_app() -> Flask:
    app = Flask(__name__)

    # Initialize OTel metrics provider once per process/worker.
    configure_metrics(
        export_interval_ms=60000
    )  # Export every minute, instead of default every 5 seconds

    # Only now import and register blueprints (routes) so instruments are created
    # against the meter provider installed in configure_metrics()
    from app.routes import webhook

    app.register_blueprint(webhook.bp)

    return app


app = create_app()

if __name__ == "__main__":
    app.run(port=8080)

And in other files, such as webhook.py referenced above, I define my own custom metrics as in this example:

# ---------------------------------------------
# OpenTelemetry (OTel) metrics
# ---------------------------------------------
# Get a meter from the provider that was installed in main.py
meter = metrics.get_meter(
    "webhooks"
)  # Any stable string works for naming this meter

# Request counter
# Metric name maps to workload.googleapis.com/request_counter in Cloud Monitoring.
requests_counter = meter.create_counter(
    name="webhook_request_counter",
    description="Total number of HTTP requests processed by the webhooks blueprint",
    unit="1",
)

And the metric is updated where needed as:

requests_counter.add(1, attributes=attrs)

Possible Explanation
I think something along these lines is happening:

  • The exporter to Cloud Monitoring is running every 60 seconds.
  • Suppose at time T a scheduled export occurs, sending new points for each time series.
  • Then some time later, the container is being terminated (e.g. Cloud Run shutting down or scaling), and before exiting, the application invokes a shutdown handler or signal handler that triggers a flush of metrics (force flush).
  • That flush occurs shortly (~1 or 2 seconds) after the last scheduled export. Some of the same time series get a new “point” during that flush with a timestamp that is only 1–2 seconds apart from the previous. Because that's < 5s, Cloud Monitoring rejects it.

Help
I do not know how to handle this event in the code in such a as to avoid the error but not result in data loss. What edits should I make? Or separately, is this an issue to be solved instead in the OpenTelemetry Python SDK for GCP?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions