-
Couldn't load subscription status.
- Fork 656
Description
What is the bug?
We are running mimir version 2.17.1 on a kubernetes cluster with istio enabled. Everything works fine but we have observed some unusual traffic w.r.t query frontend, where mimir components are trying to gossip with query frontend. But when i look at https://grafana.com/docs/mimir/latest/configure/configure-hash-rings/ which mentions mimir components which maintain hash rings and query frontend is not one of them.
mimir-compactor-0 compactor ts=2025-09-30T12:20:22.854761262Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:e9c5::2a]:7946 err="digest: write tcp [2a03:1e84:1902:14:9da3::14]:50202->[2a03:1e84:1902:14:e9c5::2a]:7946: write: connection reset by peer"
mimir-ingester-1 ingester ts=2025-09-30T12:20:25.259176423Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:14:ed85::5]:43038->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
mimir-compactor-0 compactor ts=2025-09-30T12:20:25.45488119Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:14:9da3::14]:41280->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
mimir-querier-87bdc8c8-xhgxj querier ts=2025-09-30T12:20:27.320626871Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:e9c5::2a]:7946 err="digest: write tcp [2a03:1e84:1902:13:f6d0::1a]:60706->[2a03:1e84:1902:14:e9c5::2a]:7946: write: broken pipe"
mimir-querier-87bdc8c8-xhgxj querier ts=2025-09-30T12:20:30.54214228Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:e9c5::2a]:7946 err="sending data: write tcp [2a03:1e84:1902:13:f6d0::1a]:60718->[2a03:1e84:1902:14:e9c5::2a]:7946: write: broken pipe"
mimir-store-gateway-0 store-gateway ts=2025-09-30T12:20:30.450756265Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:14:a2f3::13]:57620->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
mimir-querier-87bdc8c8-xhgxj querier ts=2025-09-30T12:20:32.152130435Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:13:f6d0::1a]:43060->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
mimir-ruler-9bcbc5b67-qrh6r ruler ts=2025-09-30T12:20:32.667270951Z caller=tcp_transport.go:496 level=warn component="memberlist TCPTransport" msg="WriteTo failed" addr=[2a03:1e84:1902:14:6b73::1c]:7946 err="sending data: write tcp [2a03:1e84:1902:14:6b73::70]:47418->[2a03:1e84:1902:14:6b73::1c]:7946: write: broken pipe"
How to reproduce it?
Deploy mimir version 2.17.1 using the mimir-distributed helm chart version 5.8.0 on a Kubernetes cluster with Istio enabled.
What did you think would happen?
There was a clarification in the grafana mimir slack channel that query-frontend indeed is involved in gossip but does not maintain its own hash ring. From the helm chart i have following observations
- Mimir components which are part of memberlist define following port as part of pod spec used for gossip, frontend does not define one.
name: memberlist
containerPort: {{ include "mimir.memberlistBindPort" . }}
protocol: TCP- Another issue we observed is query-frontend does not define following label which is used by mimir components and as selector label in gossip ring headless service. Istio blocks the traffic to the query-frontend:7946 because of this reason.
app.kubernetes.io/part-of: memberlist
After making above changes with local overrides, we no longer see the error logs.
What was your environment?
Mimir Version: 2.17.1
Deployment Method: mimir-distributed Helm chart version 5.8.0
Platform: Kubernetes cluster with Istio enabled
Network: IPv6 enabled
Scheduler Discovery Mode: DNS (default when scheduler is enabled)
Any additional context to share?
No response