-
-
Couldn't load subscription status.
- Fork 736
Description
Bug Report
A single node is always in an unhealthy state, with unmet condition being udevd not healthy.
Description
I've recently migrated my cluster off k3s and onto Talos. Unfortunately, I've been encountering an odd issue where I always have a worker node stuck in an unhealthy state, with udevd being unhealthy. This also pegs the CPU on the node to nearly 100%.
My current cluster consists of 3 control planes and 3 workers running in vSphere. When rebooting the node with the issue, the error goes away and CPU is normal on boot. However, this just 'moves' the issue to one of the other nodes. At any given time, there is always 1 node with the issue. Once an affected node is rebooted, nodes will all be healthy for a short time before the issue shows itself.
The following message always precedes the unhealthy status and CPU spike:
user: warning: [2025-10-24T22:34:30.775504439Z]: [talos] service[udevd](Running): Health check failed: exit status 1: Timed out while waiting for udev queue to empty.
I have rebuilt the cluster from scratch on new VMs with no change. All VMs are identical in configuration, except for CPU/RAM/Disk allocations between control planes and workers.
Logs
I'm not exactly sure what logs to provide, but happy to provide any given directions.
Environment
- Talos version: 1.11.1 - 1.11.3
- Kubernetes version: 1.34.1
- Platform: VMWare