[SERVER-47003] MaxTimeMSExceeded on _configsvrMoveChunk can lead to blocking future migrations for that chunk Created: 19/Mar/20  Updated: 29/Oct/23  Resolved: 30/Dec/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.2.0

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Silvia Surroca
Resolution: Fixed Votes: 0
Labels: sharding-common-backlog, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-48153 Chunk migration can still be running ... Backlog
is related to SERVER-60922 Use `config.migrations` to persist th... Closed
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding EMEA 2022-12-26
Participants:

 Description   

If moveChunk is sent with maxTimeMS, the following scenario can occur:
1. The config server sends moveChunk to the shard. The shard starts running it.
2. The config server hits maxTimeMS, which then causes it to fail to delete the relevant config.migrations doc, which happens in the destructor of the object created here, since the OperationContext has been interrupted.

If the config server gets another moveChunk attempt for that range, it will fail with DuplicateKeyError on config.migrations. config.migrations is keyed on namespace and the min value of the range being moved, so this will happen indefinitely for any chunk with the same min value.



 Comments   
Comment by Silvia Surroca [ 30/Dec/22 ]

This bug was fixed by SERVER-60922 on v5.2

The existence of duplicated entry in config.migrations is managed here

Generated at Thu Feb 08 05:13:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.