Skip to content

Bug: Query-frontend gossip connection failures with Istio #12876

@sirishkumar

Description

@sirishkumar

What is the bug?

We are running mimir version 2.17.1 on a kubernetes cluster with istio enabled. Everything works fine but we have observed some unusual traffic w.r.t query frontend, where mimir components are trying to gossip with query frontend. But when i look at https://grafana.com/docs/mimir/latest/configure/configure-hash-rings/ which mentions mimir components which maintain hash rings and query frontend is not one of them.

mimir-compactor-0 compactor ts=2025-09-30T12:20:22.854761262Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:e9c5::2a]:7946 err="digest: write tcp [2a03:1e84:1902:14:9da3::14]:50202->[2a03:1e84:1902:14:e9c5::2a]:7946: write: connection reset by peer"
mimir-ingester-1 ingester ts=2025-09-30T12:20:25.259176423Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:14:ed85::5]:43038->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
mimir-compactor-0 compactor ts=2025-09-30T12:20:25.45488119Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:14:9da3::14]:41280->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
mimir-querier-87bdc8c8-xhgxj querier ts=2025-09-30T12:20:27.320626871Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:e9c5::2a]:7946 err="digest: write tcp [2a03:1e84:1902:13:f6d0::1a]:60706->[2a03:1e84:1902:14:e9c5::2a]:7946: write: broken pipe"
mimir-querier-87bdc8c8-xhgxj querier ts=2025-09-30T12:20:30.54214228Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:e9c5::2a]:7946 err="sending data: write tcp [2a03:1e84:1902:13:f6d0::1a]:60718->[2a03:1e84:1902:14:e9c5::2a]:7946: write: broken pipe"
mimir-store-gateway-0 store-gateway ts=2025-09-30T12:20:30.450756265Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:14:a2f3::13]:57620->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
mimir-querier-87bdc8c8-xhgxj querier ts=2025-09-30T12:20:32.152130435Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:13:f6d0::1a]:43060->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
mimir-ruler-9bcbc5b67-qrh6r ruler ts=2025-09-30T12:20:32.667270951Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:14:6b73::70]:47418->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"

How to reproduce it?

Deploy mimir version 2.17.1 using the mimir-distributed helm chart version 5.8.0 on a Kubernetes cluster with Istio enabled.

What did you think would happen?

There was a clarification in the grafana mimir slack channel that query-frontend indeed is involved in gossip but does not maintain its own hash ring. From the helm chart i have following observations

  • Mimir components which are part of memberlist define following port as part of pod spec used for gossip, frontend does not define one.
name: memberlist
containerPort: {{ include "mimir.memberlistBindPort" . }}
protocol: TCP
  • Another issue we observed is query-frontend does not define following label which is used by mimir components and as selector label in gossip ring headless service. Istio blocks the traffic to the query-frontend:7946 because of this reason.
    app.kubernetes.io/part-of: memberlist

After making above changes with local overrides, we no longer see the error logs.

What was your environment?

Mimir Version: 2.17.1
Deployment Method: mimir-distributed Helm chart version 5.8.0
Platform: Kubernetes cluster with Istio enabled

Network: IPv6 enabled

Scheduler Discovery Mode: DNS (default when scheduler is enabled)

Any additional context to share?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions