[SERVER-43875] Initial sync may crash due to missing oplog entries of running transactions Created: 07/Oct/19  Updated: 29/Oct/23  Resolved: 05/Nov/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.2.0
Fix Version/s: 4.3.1, 4.2.3

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Samyukta Lanka
Resolution: Fixed Votes: 0
Labels: KP42
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Tested
tested by SERVER-44014 Add missing synchronization points to... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Repl 2019-11-04, Repl 2019-11-18
Participants:
Linked BF Score: 9

 Description   

Initial sync fetches the oldest active transaction timestamp before the top of oplog. In the following case, it would miss the necessary oplog entries needed by oplog application.

  1. Initial sync fetches the oldest active transaction timestamp, seeing no running transaction.
  2. A transaction starts and gets prepared, writing the prepare command [P] into the oplog.
  3. The transaction commits, writing the commit command [C] into the oplog.
  4. Initial sync fetches the top of oplog [C].
  5. Initial sync starts to apply operations from [C] since there was no running transaction and fails to applies it since the needed [P] is missing.


 Comments   
Comment by Githook User [ 13/Dec/19 ]

Author:

{'name': 'Samyukta Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-43875 Start initial sync oplog fetching from an earlier point to fetch all oplog entries associated with active transactions

(cherry picked from commit 853bdc4b34d9c3505e2af1f443ad7a99a619adea)
(cherry picked from commit 16757ee6065f0a1ded211ebfe33c6cd593f34b27)
Branch: v4.2
https://github.com/mongodb/mongo/commit/b7186ebc02a6997e4a3070da9616de81ca58e21d

Comment by Githook User [ 06/Nov/19 ]

Author:

{'name': 'Benety Goh', 'username': 'benety', 'email': 'benety@mongodb.com'}

Message: SERVER-43875 fix test
Branch: master
https://github.com/mongodb/mongo/commit/16757ee6065f0a1ded211ebfe33c6cd593f34b27

Comment by Githook User [ 05/Nov/19 ]

Author:

{'username': 'lankas', 'email': 'samy.lanka@mongodb.com', 'name': 'Samyukta Lanka'}

Message: SERVER-43875 Start initial sync oplog fetching from an earlier point to fetch all oplog entries associated with active transactions
Branch: master
https://github.com/mongodb/mongo/commit/853bdc4b34d9c3505e2af1f443ad7a99a619adea

Comment by Siyuan Zhou [ 17/Oct/19 ]

I like judah.schvimer's proposal. The only concern is that if the transaction command corresponding to the real OAT is an oplog hole when getting T1 and OAT, T1 may be greater than the real OAT even if an empty OAT is returned. To fix it, we can wait for all previous writes to be visible before reading OAT by adding afterClusterTime(Timestamp(0, 1) to the OAT query in the same way as in SERVER-42910.

Comment by Judah Schvimer [ 16/Oct/19 ]

We should also amend the architecture guide post SERVER-43386 with whatever we change here.

Comment by Judah Schvimer [ 15/Oct/19 ]

Instead or in addition to a targeted test for this, it may make sense to just fold SERVER-44014 into this work.

Comment by Judah Schvimer [ 07/Oct/19 ]

Readers should note this results in a node crash during initial sync, not data corruption or even a crash outside of initial sync, so the concern here is actually relatively low.

I think we can fix this by fetching the top of the oplog (T1), then fetching the oldest active transaction timestamp (OAT), and then fetching the top of the oplog again (T2). We use T2 for when to begin applying, and start fetching from min(T1,OAT). This would ensure that we start fetching before any transactions could come in, even if there is no OAT.

Since this problem only exists when there are no active transactions when OAT is fetched, we could more surgically fix this by starting fetching from (OAT == null ? T1, OAT).

Comment by Judah Schvimer [ 07/Oct/19 ]

max.hirschhorn and vlad.rachev, samy.lanka pointed out that it's strange the initial sync fuzzer hasn't caught this. Any ideas why and what we can do to surface it?

Generated at Thu Feb 08 05:04:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.