[SERVER-61666] Tenant migration fails to fetch all txn oplog entries for a txn with commit opTime equal to startFetchingDonorOpTime Created: 19/Nov/21 Updated: 29/Oct/23 Resolved: 18/Dec/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Janna Golden | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v5.2, v5.1
|
||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 183 | ||||||||||||
| Description |
|
When getting the start fetch time from a donor during a tenant migration, if the last oplog entry on the donor is for a commit transaction and this transaction spanned multiple oplog entries, the recipient will not fetch the transaction's previous oplog entries. Currently, if a transaction commits before the start fetch timestamp, we walk its oplog chain to fetch all oplog entries in the transaction. For any transactions that haven't yet committed, we'll change the start fetch time to be equal to startOpTime of the transaction to make sure we fetch all oplog entries for any open transactions. However, if a transaction commits at exactly the start fetch op time, we skip fetching its chain entirely. When applying oplog entries, the recipient will attempt to apply the commit transaction entry, but won't have fetched any previous oplog entries for the transaction and will fail with NoSuchKey, aborting the transaction. A repro is attached. |
| Comments |
| Comment by Githook User [ 23/Dec/21 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: |
| Comment by Githook User [ 18/Dec/21 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: |
| Comment by A. Jesse Jiryu Davis [ 17/Dec/21 ] |
|
Previous attempt was reverted because I forgot to add the new test to backports_required_for_multiversion_tests.yml. |
| Comment by Githook User [ 15/Dec/21 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: Revert " This reverts commit ea6a59377c01ed48157557aaaae0bd8191b7fa4e. |
| Comment by Githook User [ 06/Dec/21 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: |
| Comment by Esha Maharishi (Inactive) [ 30/Nov/21 ] |
|
jesse I believe we said if the fix is involved, merge would not have this problem, and it only causes a tenant migration to abort (does not cause data corruption), we could do a quick fix to prevent the BF. However, I'm not sure how we could do a quick fix, since the issue can happen in any test in tenant_migration_multi_stmt_txn_jscore_passthrough. Therefore, I think we need to implement an actual fix. |
| Comment by A. Jesse Jiryu Davis [ 29/Nov/21 ] |
|
suganthi.mani could you please attach your repro for initial sync? It would be useful to test that we don't break initial sync while fixing tenant migrations. esha.maharishi can you please remind me what we decided about this ticket? We considered waiting to see if PM-2353 would fix it without effort, but I think we decided we must fix this now, because .... ? |
| Comment by Suganthi Mani [ 29/Nov/21 ] |
|
Just adding more insights on it for the future reference, we use the same logic even for initial sync to calculate the begin(start) fetching and begin(start) applying timestamp. So, this should be a problem even for the initial sync. I wrote a quick repro simulating the scenario mentioned here for the initial sync, but initial sync didn't hit the problem. Looking closely into the initial sync code revealed that this piece of check prevented the initial code from hitting the NoSuchKey error issue. The rule for both initial sync and tenant migration is that we should start replaying oplog entries from Timestamp > begin(start) applying Timestamp. In the initial sync, we expand the unprepared commit transaction oplog entries after the apply timestamp check, by the oplog applier. But, in the tenant migration, the apply timestamp check is in the tenant oplog applier but the expansion happens way earlier, during the oplog batching stage. And, that's not correct. (To be noted, we can end up having a scenario like this : Txn1 start opTime < startFetchingOpTime < Txn1 (unprepared) commit opTime < startApplyingOpTime and we would hit the same problem). So, I feel, Tenant oplog batcher should only expand (unprepared commit) oplog entries with opTime > StartApplyingDonorOpTime. |