Skip to content

Conversation

@elias-dbx
Copy link

@elias-dbx elias-dbx commented Sep 26, 2025

The current clientv3 backoff behavior is to do a flat backoff with jitter. Having a too low backoff wait time can amplify cascading failures as client requests can be retried many times with a low backoff between each request. Operators of large etcd clusters can increase the backoff wait time, but for large clusters that wait time needs to be quite large in order to safely protect the cluster from a large number of clients retrying. A very high backoff time means that retries in a non cascading failure will have a larger wait time than needed. A better solution to handle both cascading failures as well as having lower retry times in non cascading failures is to implement exponential backoff within the etcd clients.

This commit implements the mechanism for exponential backoff in clients with two new parameters:

  1. BackoffExponent: configures exponential backoff factor. For example a BackoffExponent of 2.0 doubles the backoff time between each retry. The default value of BackoffExponent is 1.0 which disables exponential backoff for reverse compatibility.
  2. BackoffMaxWaitBetween: configures the max wait time when performing exponential backoff. The default value is 5 seconds.

Related to: #20717

@k8s-ci-robot
Copy link

Hi @elias-dbx. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

}

func TestBackoffExponent(t *testing.T) {
backoffExponent := float64(2.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add another testcase with backoffExponent = 1.0 to ensure that reverse compatibility is disabled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, added a test case for backoffExponent = 1.0

defaultBackoffExponent = 1.0

// client-side retry backoff exponential max wait between requests.
defaultBackoffMaxWaitBetween = 5 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give some context on why 5 seconds?
I have seen defaultBackoffWaitBetween = 25 * time.Millisecond but I don't know how it relates to these 5 seconds. https://github.com/etcd-io/etcd/blob/main/client/v3/options.go#L53

Copy link
Author

@elias-dbx elias-dbx Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose 5 seconds as it is roughly in line with other widely used client side libraries. For example the aws-sdk-go-v2 client libraries use a default max backoff of 20 seconds: https://github.com/aws/aws-sdk-go-v2/blob/main/aws/retry/standard.go#L31

Also, from experience a backoff in the range of 5-30 seconds is long enough to load shed, but short enough that system will converge in a timely manner. This is only from my experience running distributed systems.

I tried to research if there are any white papers evaluating max backoff parameters, but I could not find any.

If someone has a strong reason why the default max backoff should be a different value I would be amenable to changing it.

{generation: 2, exponent: 1.0, minDelay: 100 * time.Millisecond, maxDelay: 500 * time.Millisecond, expectedBackoff: 100 * time.Millisecond},
{generation: 3, exponent: 1.0, minDelay: 100 * time.Millisecond, maxDelay: 500 * time.Millisecond, expectedBackoff: 100 * time.Millisecond},
{generation: math.MaxUint, exponent: 1.0, minDelay: 100 * time.Millisecond, maxDelay: 500 * time.Millisecond, expectedBackoff: 100 * time.Millisecond},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking if it is worth having a testcase for backoffExponent = 0 and/or exponent = 0:

Regarding this function:

http://github.com/etcd-io/etcd/pull/20731/files#diff-c4fbc528cdd146f7f307011a8ea0a5101017b892d3c1805679ef68143ad0bd8cR39-R42

func expBackoff(generation uint, exponent float64, minDelay, maxDelay time.Duration) time.Duration {
	delay := math.Min(math.Pow(exponent, float64(generation))*float64(minDelay), float64(maxDelay))
	return time.Duration(delay)
}

math.Pow(0, n) = 0 for n > 0, which means that the delay would always be 0 (or minDelay if there's a lower bound check)

Copy link
Author

@elias-dbx elias-dbx Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking if it is worth having a testcase for backoffExponent = 0 and/or exponent = 0

That is not a possible configuration since if the user passes in BackoffExponent=0 or BackoffMaxWaitBetween=0 the default values (1.0 and 5 seconds respectively) will be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

The current clientv3 backoff behavior is to do a flat backoff with
jitter. Having a too low backoff wait time can amplify cascading
failures as client requests can be retried many times with a low backoff
between each request. Operators of large etcd clusters can increase the
backoff wait time, but for large clusters that wait time needs to be
quite large in order to safely protect the cluster from a large number
of clients retrying. A very high backoff time means that retries in a
non cascading failure will have a larger wait time than needed.
A better solution to handle both cascading failures as well as having
lower retry times in non cascading failures is to implement exponential
backoff within the etcd clients.

This commit implements the mechanism for exponential backoff in clients
with two new parameters:
1. BackoffExponent: configures exponential backoff factor. For example a
   BackoffExponent of 2.0 doubles the backoff time between each retry.
   The default value of BackoffExponent is 1.0 which disables
   exponential backoff for reverse compatibility.
2. BackoffMaxWaitBetween: configures the max wait time when performing
   exponential backoff. The default value is 5 seconds.

Signed-off-by: Elias Carter <elias@dropbox.com>
@elias-dbx elias-dbx force-pushed the client_exponential_backoff branch from 8066e99 to f364725 Compare October 1, 2025 16:03
@elias-dbx
Copy link
Author

@ronaldngounou thanks for the review, please see my comments and update.

@ronaldngounou
Copy link
Member

/ok-to-test

@k8s-ci-robot
Copy link

@elias-dbx: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-verify f364725 link true /test pull-etcd-verify
pull-etcd-robustness-amd64 f364725 link true /test pull-etcd-robustness-amd64
pull-etcd-coverage-report f364725 link true /test pull-etcd-coverage-report

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@codecov
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.19%. Comparing base (d3f136a) to head (f364725).
⚠️ Report is 27 commits behind head on main.

Additional details and impacted files
Files with missing lines Coverage Δ
client/v3/client.go 85.31% <100.00%> (+0.60%) ⬆️
client/v3/config.go 85.71% <ø> (ø)
client/v3/utils.go 100.00% <100.00%> (ø)

... and 28 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #20731      +/-   ##
==========================================
+ Coverage   69.16%   69.19%   +0.02%     
==========================================
  Files         420      422       +2     
  Lines       34817    34836      +19     
==========================================
+ Hits        24081    24104      +23     
+ Misses       9338     9335       -3     
+ Partials     1398     1397       -1     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3f136a...f364725. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: elias-dbx, ronaldngounou
Once this PR has been reviewed and has the lgtm label, please assign spzala for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants