Skip to content

Conversation

@blathers-crl
Copy link

@blathers-crl blathers-crl bot commented Oct 27, 2025

Backport 1/1 commits from #155642 on behalf of @iskettaneh.


Backport 1/1 commits from #155063.

/cc @cockroachdb/release


This commit fixes a bug where if the server controller is being drained, the drain command would return a response without populating the IsDraining. The CLI interprets this as if the draining is completed.

server: return draining progress if the server controller is draining

This commit fixes a bug where if the server controller is being drained, the drain command would return a response without populating the IsDraining field. The CLI interprets this as if the draining is completed.

Output example when running:

./cockroach node drain 3 --insecure --url {pgurl:3} --logtostderr=INFO

I251008 15:54:42.613200 15 2@rpc/peer.go:613  [rnode=?,raddr=10.142.1.228:26257,class=system,rpc] 1  connection is now healthy
node is draining... remaining: 3
I251008 15:54:42.622586 1 cli/rpc_node_shutdown.go:184  [-] 2  drain details: tenant servers: 3
node is draining... remaining: 3
I251008 15:54:42.824526 1 cli/rpc_node_shutdown.go:184  [-] 3  drain details: tenant servers: 3
node is draining... remaining: 1
I251008 15:54:43.026405 1 cli/rpc_node_shutdown.go:184  [-] 4  drain details: tenant servers: 1
node is draining... remaining: 1
I251008 15:54:43.228596 1 cli/rpc_node_shutdown.go:184  [-] 5  drain details: tenant servers: 1
node is draining... remaining: 243
I251008 15:54:44.580413 1 cli/rpc_node_shutdown.go:184  [-] 6  drain details: liveness record: 1, range lease iterations: 175, descriptor leases: 67
node is draining... remaining: 0 (complete)
drain ok

Release note (bug fix): fixed a bug in the drain command where draining a node using virtual clusters (such as clusters running Physical Cluster Replication) could return before the drain was complete, possibly resulting in shutting down the node while it still had active SQL clients and ranges leases.

Epic: None

Release justification: need to backport a bug fix to draining where we used to not fully drain nodes when using virtual clustters.


Release justification:

This commit fixes a bug where if the server controller is being drained,
the drain command would return a response without populating the
IsDraining field. The CLI interprets this as if the draining is
completed.

Output example when running:

```
./cockroach node drain 3 --insecure --url {pgurl:3} --logtostderr=INFO

I251008 15:54:42.613200 15 2@rpc/peer.go:613  [rnode=?,raddr=10.142.1.228:26257,class=system,rpc] 1  connection is now healthy
node is draining... remaining: 3
I251008 15:54:42.622586 1 cli/rpc_node_shutdown.go:184  [-] 2  drain details: tenant servers: 3
node is draining... remaining: 3
I251008 15:54:42.824526 1 cli/rpc_node_shutdown.go:184  [-] 3  drain details: tenant servers: 3
node is draining... remaining: 1
I251008 15:54:43.026405 1 cli/rpc_node_shutdown.go:184  [-] 4  drain details: tenant servers: 1
node is draining... remaining: 1
I251008 15:54:43.228596 1 cli/rpc_node_shutdown.go:184  [-] 5  drain details: tenant servers: 1
node is draining... remaining: 243
I251008 15:54:44.580413 1 cli/rpc_node_shutdown.go:184  [-] 6  drain details: liveness record: 1, range lease iterations: 175, descriptor leases: 67
node is draining... remaining: 0 (complete)
drain ok
```

Release note (bug fix): fixed a bug in the drain command where draining
a node using virtual clusters (such as clusters running Physical
Cluster Replication) could return before the drain was complete,
possibly resulting in shutting down the node while it still had active
SQL clients and ranges leases.

Epic: None
@blathers-crl blathers-crl bot requested review from a team as code owners October 27, 2025 21:20
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Oct 27, 2025
@blathers-crl
Copy link
Author

blathers-crl bot commented Oct 27, 2025

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes. Includes test-only changes, build system changes, etc.
  • Fixes for serious issues. Defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to.

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Oct 27, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl
Copy link
Author

blathers-crl bot commented Oct 27, 2025

❌ PR #156311 does not comply with backport policy

Confidence: medium
Explanation: The PR involves changes to production code with a focus on fixing a bug related to node draining in CockroachDB. The changes in files like 'pkg/server/drain.go' and 'pkg/server/server_controller.go' are central to production and therefore do not meet the exemption criteria for non-production only changes. The PR does contain a 'Release justification:' line in the PR body, but it's incomplete as it only says 'need to backport a bug fix to draining where we used to not fully drain nodes when using virtual clusters.' However, it does not explicitly clarify whether the bug fits the critical bug criteria or if this change includes a feature flag as mandated for changes that are not critical bugs. Based on the provided details, the bug seems to improve the functionality of node draining, addressing potential suboptimal performance because nodes might not fully drain, potentially violating points 4 (Bugs that can cause...suboptimal performance) of the critical bug criteria. The PR, however, does not indicate if these changes are gated by feature flags if not deemed a 'critical bug'. This needs further clarification to determine full compliance.
Recommendation: Request further clarification on the criticality of the bug or ensure the feature is gated by a default-disabled feature flag.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@celiala celiala merged commit 303aedc into staging-v24.3.22 Oct 28, 2025
13 of 14 checks passed
@celiala celiala deleted the blathers/backport-staging-v24.3.22-155642 branch October 28, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. target-release-24.3.22

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants