[SERVER-54711] Invariant on shardVersion, atClusterTime relationship for sharded write transactions Created: 22/Feb/21 Updated: 29/Oct/23 Resolved: 08/Mar/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Sergi Mateo Bellido |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Participants: | |||||
| Linked BF Score: | 59 | ||||
| Description |
|
When a moveChunk on a collection commits, the shardVersion of the collection is advanced and a timestamp is chosen such that:
Also note that a donor does not eagerly delete or mark its orphans in a way that would generate a storage engine write conflict. The orphans are eventually deleted at some arbitrary point in the future. Updating documents that are concurrently participating in a moveChunk result in those updates being transferred ("xfer mods") from the donor -> recipient. Those updates must be applied in a critical section before the moveChunk can be committed. Sharded transactions, however operate against a snapshot defined by the atClusterTime attached by a mongoS. Performing a transactional sharded update on a document that's concurrently being moved with a stale atClusterTime can result in write skew and lost data. This correctness problem is protected by the use of a shard version. But first consider this scenario that uses a stale read timestamp without a shard versioning protocol:
In this example, a client inserts and deletes a document and received a success for both requests. However the document still remains as it wasn't deleted from the proper shard. In this example, the move chunk committing generates a new shard version which we'll call SV. If the client request is operating with an atClusterTime: TS(15), but also claims to be using the fresh shard version: SV, the same bug would manifest. Thus we can conclude there is a relationship that must be maintained for correctness of the system with chunk migrations and transactions that do updates: |