-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Fully Compatible
-
ALL
-
v6.0, v5.0, v4.4
-
Sharding EMEA 2022-07-11, Sharding EMEA 2022-07-25, Sharding EMEA 2022-08-08
-
5
The pattern that we have for these operations is always the same:
- We take the kChunks lock.
- We validate that the requested operation can be applied.
- We compute the new CollectionVersions.
- Finally, through a transaction (applyOps in the past, nowadays internal transactions) we modify one or more documents on config.chunks.
Let's say that a thread of the primary node of the CSRS is blocked just after step 3 and the node steps down. Another node steps up and perform some changes to the chunks that are not related to the previous operation. Finally, the old primary node steps up and commits the migration, but installing an old CollectionVersion.
This problem affects the commit of the split, merge and moveChunk. I would also double check what happens for refineShardKey.
We might need to backport this fix to older versions.