[SERVER-41113] Ignore NamespaceNotFound exceptions when applying transactions during initial sync and recovery Created: 13/May/19  Updated: 29/Oct/23  Resolved: 30/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.14

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File initialSyncBug.js    
Issue Links:
Depends
is depended on by SERVER-39993 Add kill and terminate versions of co... Closed
Related
related to SERVER-41284 Add failpoint to surface idempotency ... Closed
related to SERVER-39804 Extend the oplog idempotency test for... Closed
is related to SERVER-41163 During initial sync, failing to apply... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2019-06-03
Participants:
Linked BF Score: 0

 Description   

The collection may have been dropped since we are applying operations out of order.

We ignore NamespaceNotFound errors in normal oplog application here.

But we don't do it here. The applyOps command doesn't seem to ignore these errors either, so wherever we use that code path to apply transaction operations we likely need to do the same.

This was found from code inspection and I haven't created a repro, so it's possible something prevents this from causing a problem.

It'll be interesting to see if the initial sync fuzzer catches this once it supports transactions, max.hirschhorn.



 Comments   
Comment by Githook User [ 30/May/19 ]

Author:

{'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}

Message: SERVER-41113 fix windows compile
Branch: master
https://github.com/mongodb/mongo/commit/a7d8870e79638583d585ac82ef0c35cd75d9fab7

Comment by Githook User [ 30/May/19 ]

Author:

{'email': 'matthew.russotto@10gen.com', 'name': 'Matthew Russotto', 'username': 'mtrussotto'}

Message: SERVER-41113 Ignore NamespaceNotFound exceptions when applying transactions during initial sync and recovery.
Branch: master
https://github.com/mongodb/mongo/commit/a1c1cd1e00a43eb470df756304cea0642a3ca4dc

Comment by William Schultz (Inactive) [ 22/May/19 ]

matthew.russotto I am adding a regression test for this bug in the idempotency test suite in sync_tail_test.cpp that you can enable when you implement the fix. Here is the TODO.

Comment by Samyukta Lanka [ 20/May/19 ]

The initial sync fuzzer did catch this. I attached the generated test for reference where you can see that initial sync fails its first attempt.

Comment by Judah Schvimer [ 17/May/19 ]

By "out of order" I meant that initial sync applies operations that have already been cloned, so effectively out of order. I'm just referring to traditional initial sync idempotency concerns. Both initial sync and recovery apply oplog operations in parallel now as far as I understand.

Comment by William Schultz (Inactive) [ 16/May/19 ]

judah.schvimer What do you mean exactly when you say we might be applying operations "out of order"? I am not clear on whether we apply oplog operations during initial sync/recovery in parallel or serially.

Comment by Judah Schvimer [ 13/May/19 ]

We should see if we can trigger this with an IdempotencyTest.

Comment by Judah Schvimer [ 13/May/19 ]

I feel like transaction oplog application should go through the normal oplog application path to the extent that it's possible.

Generated at Thu Feb 08 04:56:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.