Skip to content

Environment Variable Substitution Broken in Prometheus Receiver Due to Over-escaping #1979

@ilya-bukinich

Description

@ilya-bukinich

Summary

Environment variable substitution (e.g., ${env:VARIABLE_NAME}) in Prometheus receiver metric_relabel_configs and relabel_configs is broken due to incorrect automatic escaping of $ characters, resulting in literal strings instead of variable substitution.

Environment

  • Ops Agent Version: [Latest from master branch]
  • Platform: Linux
  • OpenTelemetry Collector: otelopscol (ops-agent embedded)

Expected Behavior

When using environment variable substitution syntax in Prometheus receiver configuration:

metrics:
  receivers:
    prometheus:
      type: prometheus
      config:
        scrape_configs:
          - job_name: 'triton'
            scrape_interval: 60s
            static_configs:
              - targets: ['triton.service:8002']
            metrics_path: /metrics
            metric_relabel_configs:
              - source_labels: [__name__]
                regex: 'nv_inference_count|nv_gpu_utilization|nv_gpu_memory_total_bytes|nv_gpu_memory_used_bytes'
                action: keep
              - action: replace
                replacement: ${env:ENVIRONMENT}
                target_label: environment

The generated OpenTelemetry collector configuration should contain:

replacement: ${env:ENVIRONMENT}

And the collector should substitute the actual environment variable value.

Actual Behavior

The generated OpenTelemetry collector configuration contains:

replacement: $${env:ENVIRONMENT}

According to OTel documentation, $$ indicates a literal $, so this results in the literal string ${env:ENVIRONMENT} being used as the label value instead of the environment variable's actual value.

Root Cause

In confgenerator/prometheus.go lines 123-127, the code unconditionally escapes all $ characters:

// Escape the $ characters in the regexes.
for i := range copyPromConfig.ScrapeConfigs {
    for j := range copyPromConfig.ScrapeConfigs[i].RelabelConfigs {
        rc := copyPromConfig.ScrapeConfigs[i].RelabelConfigs[j]
        rc.Replacement = strings.ReplaceAll(rc.Replacement, "$", "$$")  // ← Problem here
    }
    for j := range copyPromConfig.ScrapeConfigs[i].MetricRelabelConfigs {
        mrc := copyPromConfig.ScrapeConfigs[i].MetricRelabelConfigs[j]
        mrc.Replacement = strings.ReplaceAll(mrc.Replacement, "$", "$$")  // ← Problem here
    }
}

Steps to Reproduce

  1. Create Ops Agent config with environment variable substitution:
metrics:
  receivers:
    prometheus:
      type: prometheus
      config:
        scrape_configs:
          - job_name: 'test'
            metric_relabel_configs:
              - action: replace
                replacement: ${env:TEST_VAR}
                target_label: test_label
  service:
    pipelines:
      custom_metrics_pipeline:
        receivers: [prometheus]
  1. Set environment variable: export TEST_VAR="expected_value"

  2. Generate OTel config: ./libexec/google_cloud_ops_agent_engine -service otel

  3. Check generated otel.yaml - you'll see:

replacement: $${env:TEST_VAR}
  1. Run the collector - labels will contain literal ${env:TEST_VAR} instead of expected_value

Proposed Solution

The escaping logic should distinguish between:

  1. Regex capture groups (e.g., $1, $2) - these should be escaped to $$1, $$2
  2. Environment variable substitution (e.g., ${env:VAR}) - these should NOT be escaped

Suggested fix:

// Escape only regex capture groups, not environment variables
for i := range copyPromConfig.ScrapeConfigs {
    for j := range copyPromConfig.ScrapeConfigs[i].RelabelConfigs {
        rc := copyPromConfig.ScrapeConfigs[i].RelabelConfigs[j]
        rc.Replacement = escapeRegexCaptureGroups(rc.Replacement)
    }
    for j := range copyPromConfig.ScrapeConfigs[i].MetricRelabelConfigs {
        mrc := copyPromConfig.ScrapeConfigs[i].MetricRelabelConfigs[j]
        mrc.Replacement = escapeRegexCaptureGroups(mrc.Replacement)
    }
}

func escapeRegexCaptureGroups(s string) string {
    // Escape $1, $2, etc. but not ${env:...}
    re := regexp.MustCompile(`\$(\d+)`)
    return re.ReplaceAllString(s, "$$${1}")
}

Impact

  • Environment variable substitution is completely broken in Prometheus receiver
  • Users cannot dynamically configure labels based on environment
  • Workaround requires manual editing of generated otel.yaml files

Related Files

  • confgenerator/prometheus.go (lines 123-127)
  • Comment at lines 48-52 mentions the escaping but doesn't account for env vars

Additional Context

The original comment suggests this escaping was added for regex capture groups, but it's too broad and breaks legitimate environment variable usage that OpenTelemetry collector supports.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions