-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Cluster Scalability
-
ALL
-
(copied to CRM)
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Resharding recipients will write down the resume token that can be used to resume the $natural scan on the source collection transactionally when writing the latest batch returned by the aggregation.
However, we obtain these batches by running getMores, which will only return when the batch size is reached.
In the case where we added a db primary shard as a recipient despite it owning no data (see SERVER-54279 for why we do this), that recipient's getMore will never find any matching documents, never return, and therefore never mark progress. This means that if a failover occurs on that recipient, or one of the donors it is reading from is restarted, the aggregation will fail and must be restarted from the beginning. This can block resharding from making forward progress, especially if the source collection is very large and the collection scan takes a lot of time.
- is related to
-
SERVER-54279 Primary shard may end up with inconsistent collection catalog entry after resharding
-
- Closed
-
-
SERVER-91109 Optimize resharding when primary shard owns zero chunks for resharded collection
-
- Closed
-