[SERVER-31837] Recipient shard should not wait for `orphanCleanupDelaySecs` Created: 06/Nov/17  Updated: 06/Feb/24

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Tommaso Tocci
Resolution: Unresolved Votes: 4
Labels: car-investigation, oldshardingemea, sharding-emea-pm-review, shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-29344 Inform queries on a secondary node wh... Closed
Documented
Duplicate
is duplicated by SERVER-35009 Sharded cluster with small chunk size... Closed
is duplicated by SERVER-36834 chunk migration blocked on orphan cle... Closed
Related
is related to SERVER-29405 After move chunk out, pause for secon... Closed
Assigned Teams:
Catalog and Routing
Sprint: Sharding 2017-12-18, CAR Team 2024-01-22, CAR Team 2024-02-05, CAR Team 2024-02-19
Participants:
Case:
Story Points: 3

 Description   

The orphanCleanupDelaySecs parameter was added as part of the "secondaries chunk-aware" feature, so that chunk migration on a shard primary does't immediately rip-out documents from underneath queries running on the secondaries.

It essentially has the same effect as if a cursor was open for 15 minutes on a donating shard's primary. Because of this, receiving a chunk back would block for up to 15 minutes. This is a change in behaviour between 3.4 and 3.6. In the same situation. In this case 3.4 would have wiped out the range from underneath the active queries.

We should preserve the 3.4 behaviour.



 Comments   
Comment by Josef Ahmad [ 11/Jul/19 ]

I think migrating the least-recently moved chunk only mitigates this issue marginally. In the rather common case of a monotonically changing shard key, the top chunk optimisation should bounce the top chunk around shards regardless of the balancer policy.

Comment by Kaloian Manassiev [ 10/Jul/19 ]

josef.ahmad, is it the max chunk that bounces back and forth or the min? Because from tracing how the balancer distribution is constructed, I can see that the chunks per shard mapping is in increasing key order and then the selection walks that list in the same order and picks the first chunk which matches. Therefore it should be picking the lowest key (which has the same issue as the max, so I am just double-checking here).

With respect to this, do you think it might be a slightly better selection policy if we changed that logic to instead select the chunk, which has the lowest chunk version, since it moved least recently. I am wondering whether this will have some net negative consequences with respect to fragmentation for example or data hotness around the cluster. Although I can't think of how it could be any worse than picking the lowest chunk.

Comment by Andy Schwerin [ 06/Nov/17 ]

This change would allow even short-running cursors on secondaries to miss documents if they ran long enough for a chunk to migrate away and begin to migrate back. Before removing the 15 minute wait, I think we should implement the behavior to kill cursors on secondaries that have read concern stronger than "available" before range deletion occurs.

Comment by Kaloian Manassiev [ 06/Nov/17 ]

The effect in 3.4 is not only restricted to secondaries. The chunk's contents will be ripped out from underneath cursors on the primaries as well due to the waitForOpenCursors = false flag that the receive chunk code uses.

Comment by Andy Schwerin [ 06/Nov/17 ]

But in 3.4 we only effectively offered "available" read concern on secondaries. This change would allow "majority" or "local" cursors on secondaries to behave like "available" cursors. The 15 minute delay insulated most clients from seeing that behavior, reducing the urgency of implementing the general solution.

Comment by Kaloian Manassiev [ 06/Nov/17 ]

No, because we only partially implemented the feature to kill cursors on the secondaries (SERVER-29342).

However, my argument for fixing this bug is that in 3.4 and earlier we do not wait for queries to complete in this situation either.

Note that this is the case where a chunk moves out of a shard and then part of it moves back in.

Comment by Andy Schwerin [ 06/Nov/17 ]

Can you confirm that the cursors on the secondaries with read concern greater than "available" get killed before the orphan cleanup begins?

Generated at Thu Feb 08 04:28:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.