-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.3.1
-
Component/s: Sharding
-
None
-
Fully Compatible
-
ALL
-
Because mongos targets shards for multiUpdate and multiRemove, and shards check the shard version info on these requests before beginning to execute the updates and removes, if one shard returns StaleShardVersion to the mongos, the mongos will refresh its metadata, re-target and re-send the request to all relevant shards.
Therefore, a shard that did not return StaleShardVersion will re-apply the multiUpdate or multiRemove. This is harmless for multiRemoves (since removes are idempotent), but can cause unexpected behavior (the write can get applied more than once to the same document) for non-idempotent multi-updates. It's worth noting that the semantics of multiUpdate and multiRemove already allow for situations like this even on a single mongod; this issue just exacerbates the likelihood seeing behavior like this.
Right now, the mongos targets shards for multiUpdate and multiRemove, and the shards check versioning. A full fix requires both mongos and mongod to do the opposite of what they currently do:
The four options are:
1) mongos targets shards, shards check version (what we have now)
--> multiUpdates are re-applied if mongos was stale relative to the targeted shards
2) mongos sends request to all shards, shards check version
--> multiUpdates are re-applied if mongos was stale relative to ANY shard (even worse than option 1)
3) mongos targets shards, shards ignore versioning
--> a stale mongos will target the wrong shards, so both multiUpdates and multiRemoves can be lost
4) mongos sends request to all shards, shards ignore versioning (what we should be doing)
--> a stale mongos will apply the update/remove to all shards, and will not re-apply anything since the shards will not return StaleShardVersion. Though this works from a correctness perspective, the mongos will remain stale in this case.
- duplicates
-
SERVER-20361 Improve the behaviour of multi-update/delete against a sharded collection
- Backlog
- is related to
-
SERVER-17825 Remove setShardVersion from shard version protocols
- Closed
-
SERVER-22203 remove shardVersion information and usages from BatchedCommandRequest
- Closed