[SERVER-43875] Initial sync may crash due to missing oplog entries of running transactions Created: 07/Oct/19 Updated: 29/Oct/23 Resolved: 05/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 4.2.0 |
| Fix Version/s: | 4.3.1, 4.2.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Samyukta Lanka |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | KP42 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||||||||||
| Sprint: | Repl 2019-11-04, Repl 2019-11-18 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 9 | ||||||||||||||||||||
| Description |
|
Initial sync fetches the oldest active transaction timestamp before the top of oplog. In the following case, it would miss the necessary oplog entries needed by oplog application.
|
| Comments |
| Comment by Githook User [ 13/Dec/19 ] |
|
Author: {'name': 'Samyukta Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}Message: (cherry picked from commit 853bdc4b34d9c3505e2af1f443ad7a99a619adea) |
| Comment by Githook User [ 06/Nov/19 ] |
|
Author: {'name': 'Benety Goh', 'username': 'benety', 'email': 'benety@mongodb.com'}Message: |
| Comment by Githook User [ 05/Nov/19 ] |
|
Author: {'username': 'lankas', 'email': 'samy.lanka@mongodb.com', 'name': 'Samyukta Lanka'}Message: |
| Comment by Siyuan Zhou [ 17/Oct/19 ] |
|
I like judah.schvimer's proposal. The only concern is that if the transaction command corresponding to the real OAT is an oplog hole when getting T1 and OAT, T1 may be greater than the real OAT even if an empty OAT is returned. To fix it, we can wait for all previous writes to be visible before reading OAT by adding afterClusterTime(Timestamp(0, 1) to the OAT query in the same way as in |
| Comment by Judah Schvimer [ 16/Oct/19 ] |
|
We should also amend the architecture guide post |
| Comment by Judah Schvimer [ 15/Oct/19 ] |
|
Instead or in addition to a targeted test for this, it may make sense to just fold |
| Comment by Judah Schvimer [ 07/Oct/19 ] |
|
Readers should note this results in a node crash during initial sync, not data corruption or even a crash outside of initial sync, so the concern here is actually relatively low. I think we can fix this by fetching the top of the oplog (T1), then fetching the oldest active transaction timestamp (OAT), and then fetching the top of the oplog again (T2). We use T2 for when to begin applying, and start fetching from min(T1,OAT). This would ensure that we start fetching before any transactions could come in, even if there is no OAT. Since this problem only exists when there are no active transactions when OAT is fetched, we could more surgically fix this by starting fetching from (OAT == null ? T1, OAT). |
| Comment by Judah Schvimer [ 07/Oct/19 ] |
|
max.hirschhorn and vlad.rachev, samy.lanka pointed out that it's strange the initial sync fuzzer hasn't caught this. Any ideas why and what we can do to surface it? |