-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 4.0.4, 4.1.6
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.0
-
Sharding 2018-12-31, Sharding 2019-01-14
-
71
- A config server calls shardCollection
- The shard begins shardCollection
- The shard writes chunks and the metadata entry for the collection.
- The config server steps down, cancelling its shardCollection command.
- A new config server steps up.
- The new config server retries the shardCollection command.
- The new config server sees that the metadata entry for the collection has been written, erroneously assuming that the existence of a metadata entry implies that the shard has finished its shardCollection command. This in turn causes the distributed lock to be released, meaning chunk migrations and splits can get in.
- A subsequent moveChunk operation can acquire the collection dist lock and because of this can attempt acquiring the critical section, which currently crashes the server.
A config server should not be able to early return if the shard's shardCollection command is not complete.