Implement Request and Response Policy Based Routing in Cluster Mode #3422

ofekshenawa · 2025-06-30T12:13:18Z

This PR introduces support for Redis COMMAND-based request_policy and response_policy routing for Redis commands when used in OSS Cluster client.

Key Additions:

Command Policy Loader: Parses and caches COMMAND metadata with routing/aggregation tips on first use.
Routing Engine Enhancements:
Implements support for all request policies: default(keyless), default(hashslot), all_shards, all_nodes, multi_shard, and special.
Response Aggregator: Combines multi-shard replies based on response_policy:
all_succeeded, one_succeeded, agg_sum, special, etc.
Includes custom handling for special commands like FT.CURSOR.
Raw Command Support: Policies are enforced on Client.Do(ctx, args...).

htemelski-redis · 2025-09-25T07:23:19Z

osscluster_router.go

+		}
+		if result.cmd != nil && result.err == nil {
+			// For MGET, extract individual values from the array result
+			if strings.ToLower(cmd.Name()) == "mget" {


Do we actually need this special case?

cc @ofekshenawa

osscluster_router.go

.github/workflows/build.yml

ndyakov

Submitting partial review for the aggregators.

internal/routing/aggregator.go

ndyakov · 2025-10-09T11:30:38Z

osscluster_router.go

+	// For MGET without policy, use keyed aggregator
+	if cmdName == "mget" {
+		return routing.NewDefaultAggregator(true)
+	}


Since we are passing the cmd.Name() in routing.NewResponseAggregator this can be handler by it. If policy is nil for mget, maybe the NewResponseAggregator can accept a policy and check the nil as well`.

cc @ofekshenawa , @htemelski-redis

internal/routing/aggregator.go

ndyakov

Submitting another partial review.

ndyakov · 2025-10-09T12:00:22Z

internal/routing/policy.go

+}
+
+func (p *CommandPolicy) CanBeUsedInPipeline() bool {
+	return p.Request != ReqAllNodes && p.Request != ReqAllShards && p.Request != ReqMultiShard


What about special? Can it be used in a pipeline?

My understanding is that special should be handled on a case-by-case basis

ndyakov · 2025-10-09T12:04:21Z

internal/routing/shard_picker.go

+// ShardPicker chooses “one arbitrary shard” when the request_policy is
+// ReqDefault and the command has no keys.
+type ShardPicker interface {
+	Next(total int) int // returns an index in [0,total)
+}


Those are great, can we implement StaticShardPicker or StickyShardPicker that will always return the same shard. I do think this can be helpful for testing. This is not a blocker by any means.

can we just add a todo for those, i think they can be useful in the future.

command.go

ndyakov · 2025-10-09T12:32:07Z

command.go

+	if commandInfoTips != nil {
+		if v, ok := commandInfoTips[requestPolicy]; ok {
+			if p, err := routing.ParseRequestPolicy(v); err == nil {
+				req = p
+			}
+		}
+		if v, ok := commandInfoTips[responsePolicy]; ok {
+			if p, err := routing.ParseResponsePolicy(v); err == nil {
+				resp = p
+			}
+		}
+	}
+	tips := make(map[string]string, len(commandInfoTips))
+	for k, v := range commandInfoTips {
+		if k == requestPolicy || k == responsePolicy {
+			continue
+		}
+		tips[k] = v
+	}


can't we do both of those in a single range over commandInfoTips?

Not sure that I completely understand the question

command.go

ndyakov · 2025-10-09T12:42:55Z

json.go

 	return nil
 }

+func (cmd *IntPointerSliceCmd) Clone() Cmder {


it's tricky here. do we need to return the same pointer or do we only want the value when cloning?

still not sure where this type is used and if we would like the pointer of the value, cc @ofekshenawa , @htemelski-redis

osscluster.go

ndyakov

Final part of initial review

Overview:

Let's use atomics when possible.
Left questions related to the node selection and setting of values.

Overall the design of the solution looks good, would have to do an additional pass over the test files once this review is addressed.

Thank you both @ofekshenawa and @htemelski-redis!

osscluster_router.go

ndyakov · 2025-10-09T13:03:40Z

osscluster_router.go

+	if c.hasKeys(cmd) {
+		// execute on key based shard
+		return node.Client.Process(ctx, cmd)
+	}


Do we know that this node servers the slot for the key?

Yes, the node should've been selected based on the slot osscluster.go:L1906

func (c *ClusterClient) cmdNode(

ndyakov · 2025-10-09T13:04:38Z

osscluster_router.go

+		// execute on key based shard
+		return node.Client.Process(ctx, cmd)
+	}
+	return c.executeOnArbitraryShard(ctx, cmd)


since it doesn't matter and there is already some node selected, why not use it?

We have two different ways of picking an arbitrary shard, either round robin or a random one

Yes, I understand that, but for some reason there is already a node selected here that may have been selected because MOVED OR normal key based selection. Why do we have to reselect the node? Shouldn't this selection of arbitrary node be done outside, so we do the node selection only one time and the node on line #52 is the one that should be used for this command?

osscluster_router.go

ndyakov · 2025-10-09T13:17:53Z

osscluster_router.go

+			// Command executed successfully but value extraction failed
+			// This is common for complex commands like CLUSTER SLOTS
+			// The command already has its result set correctly, so just return


I do not understand that comment here. Why the value extraction returned nil? Can we make sure the cmd has value set at least? If it doesn't, we may return a cmd with nil value and nil error, which doesn't make sense.

cc @ofekshenawa , @htemelski-redis

osscluster_router.go

ndyakov

left some comments related to aggregators

internal/routing/aggregator.go

osscluster.go

ndyakov

The aggregators look good, there are some prints left in the code as bunch of unanswered questions. Let's resolve them before merging this. cc @ofekshenawa , @htemelski-redis

ndyakov · 2025-10-28T15:42:09Z

internal/routing/aggregator.go

+// AggMaxAggregator returns the maximum numeric value from all shards.
+type AggMaxAggregator struct {
+	err atomic.Value
+	res *util.AtomicMax
+}


general question, are those min,max aggregators only for ints?

That's a good question, the initial implementation was for ints only, but looking at the docs, we should support any numerical types

internal/routing/policy.go

ndyakov · 2025-10-28T15:46:26Z

internal/routing/shard_picker.go

+// ShardPicker chooses “one arbitrary shard” when the request_policy is
+// ReqDefault and the command has no keys.
+type ShardPicker interface {
+	Next(total int) int // returns an index in [0,total)
+}


can we just add a todo for those, i think they can be useful in the future.

command.go

ndyakov · 2025-10-28T15:55:39Z

command.go

+	if commandInfoTips != nil {
+		if v, ok := commandInfoTips[requestPolicy]; ok {
+			if p, err := routing.ParseRequestPolicy(v); err == nil {
+				req = p
+			}
+		}
+		if v, ok := commandInfoTips[responsePolicy]; ok {
+			if p, err := routing.ParseResponsePolicy(v); err == nil {
+				resp = p
+			}
+		}
+	}
+	tips := make(map[string]string, len(commandInfoTips))
+	for k, v := range commandInfoTips {
+		if k == requestPolicy || k == responsePolicy {
+			continue
+		}
+		tips[k] = v
+	}


Suggested change

if commandInfoTips != nil {

if v, ok := commandInfoTips[requestPolicy]; ok {

if p, err := routing.ParseRequestPolicy(v); err == nil {

req = p

}

}

if v, ok := commandInfoTips[responsePolicy]; ok {

if p, err := routing.ParseResponsePolicy(v); err == nil {

resp = p

}

}

}

tips := make(map[string]string, len(commandInfoTips))

for k, v := range commandInfoTips {

if k == requestPolicy || k == responsePolicy {

continue

}

tips[k] = v

}

tips := make(map[string]string, len(commandInfoTips))

for k, v := range commandInfoTips {

if k == requestPolicy {

if p, err := routing.ParseRequestPolicy(v); err == nil {

req = p

}

continue

}

if k == responsePolicy {

if p, err := routing.ParseResponsePolicy(v); err == nil {

resp = p

}

continue

}

tips[k] = v

}

ndyakov · 2025-10-28T16:05:18Z

osscluster_router.go

+	// For MGET without policy, use keyed aggregator
+	if cmdName == "mget" {
+		return routing.NewDefaultAggregator(true)
+	}


cc @ofekshenawa , @htemelski-redis

ndyakov · 2025-10-28T16:05:39Z

osscluster_router.go

+			// Command executed successfully but value extraction failed
+			// This is common for complex commands like CLUSTER SLOTS
+			// The command already has its result set correctly, so just return


cc @ofekshenawa , @htemelski-redis

osscluster_router.go

ndyakov · 2025-10-28T16:07:10Z

osscluster_router.go

+
+	defer func() {
+		if r := recover(); r != nil {
+			cmd.SetErr(fmt.Errorf("redis: failed to set command value: %v", r))


why don't we return the error as well? it will return nil, but the err will be set on the cmd.

Not sure, how good of a practice is to modify the return value from within recover

.github/workflows/build.yml

* feat: load the policy table in cluster client * Remove comments

…or osscluster.go (#6) * centralize cluster command routing in osscluster_router.go and refactor osscluster.go * enalbe ci on all branches * Add debug prints * Add debug prints * FIX: deal with nil policy * FIX: fixing clusterClient process * chore(osscluster): simplify switch case * wip(command): ai generated clone method for commands * feat: implement response aggregator for Redis cluster commands * feat: implement response aggregator for Redis cluster commands * fix: solve concurrency errors * fix: solve concurrency errors * return MaxRedirects settings * remove locks from getCommandPolicy * Handle MOVED errors more robustly, remove cluster reloading at exectutions, ennsure better routing * Fix: supports Process hook test * Fix: remove response aggregation for single shard commands * Add more preformant type conversion for Cmd type * Add router logic into processPipeline --------- Co-authored-by: Nedyalko Dyakov <nedyalko.dyakov@gmail.com>

…ot be used in pipeline

ofekshenawa changed the title ~~Load balance search commands to shards~~ Implement Request and Response Policy Based Routing in Cluster Mode Jun 30, 2025

ofekshenawa requested review from bobymicroby, htemelski-redis and ndyakov July 6, 2025 10:28

ofekshenawa marked this pull request as ready for review July 6, 2025 12:54

htemelski-redis requested changes Sep 25, 2025

View reviewed changes

htemelski-redis force-pushed the load-balance-search-commands-to-shards branch from 6e3b627 to 1b2eaa6 Compare October 8, 2025 08:05

ndyakov reviewed Oct 9, 2025

View reviewed changes

.github/workflows/build.yml Outdated Show resolved Hide resolved

ndyakov reviewed Oct 9, 2025

View reviewed changes

htemelski-redis force-pushed the load-balance-search-commands-to-shards branch from 727a799 to 14bd6e1 Compare October 14, 2025 07:42

ndyakov reviewed Oct 14, 2025

View reviewed changes

htemelski-redis reviewed Oct 17, 2025

View reviewed changes

osscluster.go Outdated Show resolved Hide resolved

htemelski-redis self-requested a review October 28, 2025 09:32

ndyakov requested changes Oct 28, 2025

View reviewed changes

htemelski-redis force-pushed the load-balance-search-commands-to-shards branch from a4ac8df to 7181bcc Compare October 30, 2025 08:44

htemelski-redis requested a review from dmaier-redislabs as a code owner October 30, 2025 08:44

This comment was marked as resolved.

Sign in to view

ofekshenawa and others added 11 commits October 30, 2025 11:21

feat(routing): add internal request/response policy enums

b2d2d91

feat: load the policy table in cluster client (#4)

b943692

* feat: load the policy table in cluster client * Remove comments

modify Tips and command pplicy in commandInfo (#5)

5375c51

remove thread debugging code

9cffa79

remove thread debugging code && reject commands with policy that cann…

9f6f2c9

…ot be used in pipeline

refactor processPipline and cmdType enum

9087c21

remove FDescribe from cluster tests

7ce5f78

Add tests

4780dd8

fix aggregation test

7d80b8a

fix mget test

d70bf76

htemelski-redis added 22 commits October 30, 2025 11:21

addressed more comments, fixed lint

b945fbd

added batch aggregator operations

06dfd2c

fixed lint

bd5386f

updated batch aggregator, fixed extractcommandvalue

a402e47

fixed lint

4267f7c

added batching to aggregateResponses

41c4a43

fixed deadlocks

77e25b6

changed aggregator logic, added error params

23b35be

added preemptive return to the aggregators

4801633

more work on the aggregators

4143e53

updated and and or aggregators

212619b

fixed lint

024339a

added configurable policy resolvers

f4cb0f5

slight refactor

20392b3

removed the interface, slight refactor

1ad51cd

change func signature from cmdName to cmder

30c0c63

added nil safety assertions

5a510cf

few small refactors

14dde5c

added read only policies

b72becc

removed leftover prints

731505b

Rebased to master, resolved comnflicts

888f791

fixed lint

cd74db0

htemelski-redis force-pushed the load-balance-search-commands-to-shards branch from 07963c2 to cd74db0 Compare October 30, 2025 09:27

htemelski-redis changed the base branch from load-balance-search-commands-to-shards to master October 30, 2025 09:28

htemelski-redis added 6 commits October 30, 2025 13:19

updated gha

68f7af8

fixed tests, minor consistency refactor

5c447f9

preallocated simple errors

86c73a0

Merge branch 'master' into load-balance-search-commands-to-shards

79b3cf6

changed numeric aggregators to use float64

71262ec

speculative test fix

79fd0cf

Implement Request and Response Policy Based Routing in Cluster Mode #3422

Are you sure you want to change the base?

Implement Request and Response Policy Based Routing in Cluster Mode #3422

Conversation

ofekshenawa commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Additions:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ndyakov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ndyakov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ndyakov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndyakov Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ndyakov left a comment

Choose a reason for hiding this comment

Uh oh!

ofekshenawa commented Jun 30, 2025 •

edited

Loading

ndyakov Oct 10, 2025 •

edited

Loading