[SERVER-41113] Ignore NamespaceNotFound exceptions when applying transactions during initial sync and recovery Created: 13/May/19 Updated: 29/Oct/23 Resolved: 30/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.14 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Matthew Russotto |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Repl 2019-06-03 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||
| Description |
|
The collection may have been dropped since we are applying operations out of order. We ignore NamespaceNotFound errors in normal oplog application here. But we don't do it here. The applyOps command doesn't seem to ignore these errors either, so wherever we use that code path to apply transaction operations we likely need to do the same. This was found from code inspection and I haven't created a repro, so it's possible something prevents this from causing a problem. It'll be interesting to see if the initial sync fuzzer catches this once it supports transactions, max.hirschhorn. |
| Comments |
| Comment by Githook User [ 30/May/19 ] |
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: |
| Comment by Githook User [ 30/May/19 ] |
|
Author: {'email': 'matthew.russotto@10gen.com', 'name': 'Matthew Russotto', 'username': 'mtrussotto'}Message: |
| Comment by William Schultz (Inactive) [ 22/May/19 ] |
|
matthew.russotto I am adding a regression test for this bug in the idempotency test suite in sync_tail_test.cpp that you can enable when you implement the fix. Here is the TODO. |
| Comment by Samyukta Lanka [ 20/May/19 ] |
|
The initial sync fuzzer did catch this. I attached the generated test for reference where you can see that initial sync fails its first attempt. |
| Comment by Judah Schvimer [ 17/May/19 ] |
|
By "out of order" I meant that initial sync applies operations that have already been cloned, so effectively out of order. I'm just referring to traditional initial sync idempotency concerns. Both initial sync and recovery apply oplog operations in parallel now as far as I understand. |
| Comment by William Schultz (Inactive) [ 16/May/19 ] |
|
judah.schvimer What do you mean exactly when you say we might be applying operations "out of order"? I am not clear on whether we apply oplog operations during initial sync/recovery in parallel or serially. |
| Comment by Judah Schvimer [ 13/May/19 ] |
|
We should see if we can trigger this with an IdempotencyTest. |
| Comment by Judah Schvimer [ 13/May/19 ] |
|
I feel like transaction oplog application should go through the normal oplog application path to the extent that it's possible. |