Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.9.0
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Linked BF Score:
59
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When a moveChunk on a collection commits, the shardVersion of the collection is advanced and a timestamp is chosen such that:

Reads of moved documents after that timestamp on the recipient are considered "owned"
Reads of moved documents after that timestamp on the donor are considered "orphaned"

Also note that a donor does not eagerly delete or mark its orphans in a way that would generate a storage engine write conflict. The orphans are eventually deleted at some arbitrary point in the future.

Updating documents that are concurrently participating in a moveChunk result in those updates being transferred ("xfer mods") from the donor -> recipient. Those updates must be applied in a critical section before the moveChunk can be committed.

Sharded transactions, however operate against a snapshot defined by the atClusterTime attached by a mongoS. Performing a transactional sharded update on a document that's concurrently being moved with a stale atClusterTime can result in write skew and lost data.

This correctness problem is protected by the use of a shard version. But first consider this scenario that uses a stale read timestamp without a shard versioning protocol:

| Client                                                 | Config                              | Shard0            | Shard1            |
|--------------------------------------------------------+-------------------------------------+-------------------+-------------------|
| Insert A                                               |                                     | Insert A @ TS(10) |                   |
|                                                        | MoveChunkStarts                     |                   |                   |
|                                                        |                                     |                   | Insert A @ TS(20) |
|                                                        | MoveChunkCommits ValidAfter: TS(30) |                   |                   |
| Delete A (broadcast transaction) atClusterTime: TS(15) |                                     | Delete A @ 40     | `A` Not Found     |

In this example, a client inserts and deletes a document and received a success for both requests. However the document still remains as it wasn't deleted from the proper shard.

In this example, the move chunk committing generates a new shard version which we'll call SV. If the client request is operating with an atClusterTime: TS(15), but also claims to be using the fresh shard version: SV, the same bug would manifest.

Thus we can conclude there is a relationship that must be maintained for correctness of the system with chunk migrations and transactions that do updates:
For a shard receiving a startTransaction request with a shardVersion SV, the request's atClusterTime must be at least the validAfter timestamp associated SV.

Assignee:: Sergi Mateo Bellido
Reporter:: Daniel Gottlieb (Inactive)
Participants:: Daniel Gottlieb, Sergi Mateo Bellido
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Feb 22 2021 07:52:03 PM UTC
Updated:: Oct 29 2023 09:57:10 PM UTC
Resolved:: Mar 08 2021 03:45:33 PM UTC

Details

Description

Attachments

Activity

People

Dates