[SERVER-61147] Ensure safe deletion of TenantMigrationRecipientAccessBlocker for Shard Merge & MTM. Created: 30/Oct/21 Updated: 16/Oct/23 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | A. Jesse Jiryu Davis | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
EDIT: In-memory RTAB is used to reject user donor snapshot reads earlier than rejectReadsBeforeTimestamp after shard merge commit because the first version of shard merge doesn't preserve/copy donor history. In-memory RTAB is deleted once the recipient state document is deleted. Currently, TTL delete the state document after shard merge completion with GC delay of tenantMigrationGarbageCollectionDelayMS (default is 15 mins). This is risky as there is no guarantee that R’s oldest timestamp >= rejectReadsBeforeTimestamp after GC delay. So, after merge commit, we can have readers trying to do donor data reads at snapshot earlier than rejectReadsBeforeTimestamp after GC delay. And, there will be nothing to prevent those readers from reading inconsistent data, violating snapshot read guarantees. (The same problem exists in MTM protocol as well). |
| Comments |
| Comment by Suganthi Mani [ 12/Dec/22 ] |
|
Andy, Esha and myself agreed with good-enough solution i.e, GC window to be > > minSnapshotHistoryWindow and is a very small effort solution. So, we doesn't need to bring this into product's radar. |
| Comment by Suganthi Mani [ 23/Nov/22 ] |
|
As per offline discussion with esha.maharishi@mongodb.com, since this is a bug both in MTM and shard merge, moving this ticket out of Shard Merge project and putting into serverless backlog |
| Comment by Suganthi Mani [ 04/Apr/22 ] |
|
| Comment by Suganthi Mani [ 04/Apr/22 ] |
|
Making GC delay > snapshot window can still be racy and lead to incorrect data response for snapshot reads (see here for more discussion on it)
|
| Comment by A. Jesse Jiryu Davis [ 27/Mar/22 ] |
|
The existing tests cover new Shard Merge functionality. However, we're planning to enable snapshot reads in Serverless and we must ensure access blockers remain for minSnapshotHistoryWindowInSeconds. If minSnapshotHistoryWindowInSeconds < tenantMigrationGarbageCollectionDelayMS: 1. Client starts snapshot session on donor D. The default tenantMigrationGarbageCollectionDelayMS is 15 minutes, the default minSnapshotHistoryWindowInSeconds is 5 minutes, so this bug won't usually be possible. But let's either enforce this relationship, or set the GC delay to max(tenant GC delay, snapshot window) to be sure. |
| Comment by A. Jesse Jiryu Davis [ 13/Jan/22 ] |
|
Ensure that the TenantMigrationRecipientAccessBlocker rejects such reads with "SnapshotTooOld" if the readConcern's "atClusterTime" predates the merge. Ensure access blockers remain for at least minSnapshotHistoryWindowInSeconds. |