[SERVER-14708] A mixed single node & replicaset set sharded cluster can wait for replication on the single node when moving chunks, if secondaryThrottle is enabled Created: 28/Jul/14  Updated: 10/Dec/14  Resolved: 06/Aug/14

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: 2.4.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alan Spencer Assignee: Randolph Tan
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-14041 enhance secondaryThrottle parameter Closed
Related
related to SERVER-9407 Helpers::removeRange always waits for... Closed
related to SERVER-14041 enhance secondaryThrottle parameter Closed
is related to SERVER-14465 Default write concern based on config... Closed
Operating System: ALL
Participants:

 Description   

If you have a sharded cluster:
A - a single node
B - a replicaset of 3 data bearing nodes.

When performing a moveChunk the secondaryThrottle flag is checked against the target shard and used if there are multiple nodes in the target shard. This setting is used for both the move to the target shard, and the remove from the source shard.

When moving from A->B

  • The target shard has multiple nodes, so secondaryThrottle is used. I.e. the remove from A is performed with a {w:2} and timeout of 60 seconds, thereby waiting for 60 seconds for each moveChunk.

Also, when moving from B->A

  • The target shard has only a single node, so no secondaryThrottle is used. I.e. the remove from B does not wait for replication, though that may be what is intended.

This can cause large delays.



 Comments   
Comment by Randolph Tan [ 06/Aug/14 ]

Fixed by more defensive checks and code refactor work done in SERVER-14041.

Comment by Asya Kamsky [ 28/Jul/14 ]

Work-arounds available:

1. have every shard configured with same majority value (specifically do not mix majority=1 on some shards with majority>1 on others)

2. turn off secondaryThrottle option cluster wide if somehow 1. cannot be avoided

Generated at Thu Feb 08 03:35:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.