Skip to content

Conversation

@blathers-crl
Copy link

@blathers-crl blathers-crl bot commented Oct 27, 2025

Backport 1/1 commits from #155640 on behalf of @iskettaneh.


Backport 1/1 commits from #155063.

/cc @cockroachdb/release


This commit fixes a bug where if the server controller is being drained, the drain command would return a response without populating the IsDraining. The CLI interprets this as if the draining is completed.

server: return draining progress if the server controller is draining

This commit fixes a bug where if the server controller is being drained, the drain command would return a response without populating the IsDraining field. The CLI interprets this as if the draining is completed.

Output example when running:

./cockroach node drain 3 --insecure --url {pgurl:3} --logtostderr=INFO

I251008 15:54:42.613200 15 2@rpc/peer.go:613  [rnode=?,raddr=10.142.1.228:26257,class=system,rpc] 1  connection is now healthy
node is draining... remaining: 3
I251008 15:54:42.622586 1 cli/rpc_node_shutdown.go:184  [-] 2  drain details: tenant servers: 3
node is draining... remaining: 3
I251008 15:54:42.824526 1 cli/rpc_node_shutdown.go:184  [-] 3  drain details: tenant servers: 3
node is draining... remaining: 1
I251008 15:54:43.026405 1 cli/rpc_node_shutdown.go:184  [-] 4  drain details: tenant servers: 1
node is draining... remaining: 1
I251008 15:54:43.228596 1 cli/rpc_node_shutdown.go:184  [-] 5  drain details: tenant servers: 1
node is draining... remaining: 243
I251008 15:54:44.580413 1 cli/rpc_node_shutdown.go:184  [-] 6  drain details: liveness record: 1, range lease iterations: 175, descriptor leases: 67
node is draining... remaining: 0 (complete)
drain ok

Release note (bug fix): fixed a bug in the drain command where draining a node using virtual clusters (such as clusters running Physical Cluster Replication) could return before the drain was complete, possibly resulting in shutting down the node while it still had active SQL clients and ranges leases.

Epic: None

Release justification: need to backport a bug fix to draining where we used to not fully drain nodes when using virtual clustters.


Release justification:

This commit fixes a bug where if the server controller is being drained,
the drain command would return a response without populating the
IsDraining field. The CLI interprets this as if the draining is
completed.

Output example when running:

```
./cockroach node drain 3 --insecure --url {pgurl:3} --logtostderr=INFO

I251008 15:54:42.613200 15 2@rpc/peer.go:613  [rnode=?,raddr=10.142.1.228:26257,class=system,rpc] 1  connection is now healthy
node is draining... remaining: 3
I251008 15:54:42.622586 1 cli/rpc_node_shutdown.go:184  [-] 2  drain details: tenant servers: 3
node is draining... remaining: 3
I251008 15:54:42.824526 1 cli/rpc_node_shutdown.go:184  [-] 3  drain details: tenant servers: 3
node is draining... remaining: 1
I251008 15:54:43.026405 1 cli/rpc_node_shutdown.go:184  [-] 4  drain details: tenant servers: 1
node is draining... remaining: 1
I251008 15:54:43.228596 1 cli/rpc_node_shutdown.go:184  [-] 5  drain details: tenant servers: 1
node is draining... remaining: 243
I251008 15:54:44.580413 1 cli/rpc_node_shutdown.go:184  [-] 6  drain details: liveness record: 1, range lease iterations: 175, descriptor leases: 67
node is draining... remaining: 0 (complete)
drain ok
```

Release note (bug fix): fixed a bug in the drain command where draining
a node using virtual clusters (such as clusters running Physical
Cluster Replication) could return before the drain was complete,
possibly resulting in shutting down the node while it still had active
SQL clients and ranges leases.

Epic: None
@blathers-crl blathers-crl bot requested review from a team as code owners October 27, 2025 21:20
@blathers-crl blathers-crl bot force-pushed the blathers/backport-staging-v25.2.8-155640 branch from 2b7c4d5 to 93fbb91 Compare October 27, 2025 21:20
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Oct 27, 2025
@blathers-crl
Copy link
Author

blathers-crl bot commented Oct 27, 2025

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes. Includes test-only changes, build system changes, etc.
  • Fixes for serious issues. Defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to.

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Oct 27, 2025
@blathers-crl
Copy link
Author

blathers-crl bot commented Oct 27, 2025

✅ PR #156312 is compliant with backport policy

Confidence: high
Critical bug criteria met: [Stability or security issues Bugs that can cause the DB to return incorrect results or result in suboptimal performance]
Backward compatible: true
Explanation: The pull request complies with the CockroachDB backport policy. It targets a critical bug related to the server not correctly signaling the status of a draining node, which prevents the drain command from accurately reporting progress. This can lead to premature shutdowns while SQL clients and range leases are still active. According to the policy, fixes for critical bugs like this, which improve stability and prevent incorrect execution, do not require a feature flag for backporting. Moreover, the changes are accompanied by a clear Release justification in the PR body, which is a requirement for exceptions. The file 'pkg/server/drain.go', which is changed in this PR, fits under production files, and the critical nature of the bug is highlighted by the detailed explanation provided in the PR body and change description. Additionally, the modifications include associated tests, ensuring that the changes are well-verified.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@celiala celiala merged commit c6b5625 into staging-v25.2.8 Oct 28, 2025
16 checks passed
@celiala celiala deleted the blathers/backport-staging-v25.2.8-155640 branch October 28, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. target-release-25.2.8

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants