OpenTelemetry CloudMonitoringMetricsExporter error with Cloudrun

**Background**
I have a multi-process Python service running in Cloudrun which has Custom Opentelemetry metrics which most of the time are running successfully.

**Error**
From time to time (i'm testing with 1k+ requests which are spinning more than 10 instances) i get the following error:

```
2025-10-21 15:57:31,136 2     webapp.v2.cr_metrics INFO  Export successful on attempt 1
Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/opentelemetry/exporter/cloud_monitoring/__init__.py", line 248, in _get_metric_descriptor
    response_descriptor = self.client.create_metric_descriptor(
  File "/opt/venv/lib/python3.12/site-packages/google/cloud/monitoring_v3/services/metric_service/client.py", line 1386, in create_metric_descriptor
    response = rpc(
  File "/opt/venv/lib/python3.12/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/opt/venv/lib/python3.12/site-packages/google/api_core/timeout.py", line 130, in func_with_timeout
    return func(*args, **kwargs)
  File "/opt/venv/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 77, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded
The above exception was the direct cause of the following exception:
>
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2025-10-21T15:57:10.835572366+00:00"}"
	details = "Deadline Exceeded"
	status = StatusCode.DEADLINE_EXCEEDED
Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/opt/venv/lib/python3.12/site-packages/grpc/_interceptor.py", line 277, in __call__
    response, ignored_call = self._with_call(
  File "/opt/venv/lib/python3.12/site-packages/grpc/_interceptor.py", line 332, in _with_call
    return call.result(), call
  File "/opt/venv/lib/python3.12/site-packages/grpc/_channel.py", line 440, in result
    raise self
  File "/opt/venv/lib/python3.12/site-packages/grpc/_interceptor.py", line 315, in continuation
    response, call = self._thunk(new_method).with_call(
  File "/opt/venv/lib/python3.12/site-packages/grpc/_channel.py", line 1198, in with_call
    return _end_unary_response_blocking(state, call, True, None)
  File "/opt/venv/lib/python3.12/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
type: "workload.googleapis.com/requests_total"
display_name: "requests_total"
description: "Total number of HTTP requests received"
unit: "1"
value_type: INT64
metric_kind: CUMULATIVE
}
  key: "opentelemetry_id"
labels {
}
  key: "endpoint"
labels {
}
  key: "http_status_code"
labels {
}
  key: "instance_id"
labels {
}
  key: "service_name"
2025-10-21 15:57:11,636 2     opentelemetry.exporter.cloud_monitoring ERROR Failed to create metric descriptor labels {
```

**Current implementation**
For the current implementation i'm using a periodic exporting metricReader with a custom Exporter:

```
# Wrap with retry logic
      exporter = RetryableCloudMonitoringExporter(
         base_exporter,
         initial_delay=4.0,
         max_delay=10,
         backoff_multiplier=2.0
      )
      reader = PeriodicExportingMetricReader(
         exporter,
         export_interval_millis=60_000 + jitter_milis, # export between 60s-120s
         export_timeout_millis=60_000,
      )
```
While the retry logic is the following:

```
def _export_with_retry(self, metrics_data, timeout_millis, attempt=0):
      try:
         result = self.base_exporter.export(metrics_data,
                                            timeout_millis=timeout_millis)
         if result == MetricExportResult.SUCCESS:
            logger.info("Export successful on attempt %d", attempt + 1)
            return result
         # Export returned failure status
         error_msg = f"Export returned failure status: {result}"
         logger.warning(error_msg)
         if attempt < self.max_retries:
            delay = min(self.initial_delay * (self.backoff_multiplier ** attempt),
                        self.max_delay)
            logger.warning(
                "Export failed (attempt %d/%d), retrying in %ds: %s",
                        attempt + 1, self.max_retries + 1, delay, error_msg)
            time.sleep(delay)
            return self._export_with_retry(metrics_data, timeout_millis,
                                           attempt + 1)
         logger.error("Export failed after %d attempts: %s",
                  attempt + 1, error_msg)
         return result
      except Exception as e: # pylint: disable=broad-except
         if attempt < self.max_retries and self._is_retryable_error(e):
            delay = min(self.initial_delay * (self.backoff_multiplier ** attempt),
                        self.max_delay)
            logger.warning(
                    "Export failed with exception (attempt %d/%d ), "
                    "retrying in %ds: %s",
                    attempt + 1, self.max_retries + 1, delay, e)
            time.sleep(delay)
            return self._export_with_retry(metrics_data, timeout_millis, attempt + 1)
         logger.error("Export failed with exception after %d attempts: %s",
                      attempt + 1, e)
         raise
```

One issue that i'm having is that i cannot rely on the retry because I see 504 Deadline exceeded error while the result is `MetricExportResult.SUCCESS`. This is not happening during shutdown this is happening while processing requests.
One thing here, the metrics seems to be recorded correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

OpenTelemetry CloudMonitoringMetricsExporter error with Cloudrun #438

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

OpenTelemetry CloudMonitoringMetricsExporter error with Cloudrun #438

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions