[SERVER-34768] Rollback can fail if run against a lagged node that catches up Created: 01/May/18  Updated: 29/Oct/23  Resolved: 15/Jan/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.2.4, 4.3.3, 4.0.17

Type: Bug Priority: Major - P3
Reporter: Vesselina Ratcheva (Inactive) Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-33812 First initial sync oplog read batch f... Closed
is related to SERVER-46050 Use getLastAppliedOpTime rather than ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0
Sprint: Repl 2018-06-18, Repl 2018-07-02, Repl 2018-07-16, Repl 2018-07-30, Repl 2019-11-18, Repl 2019-12-02, Repl 2020-01-27
Participants:
Linked BF Score: 52

 Description   

It is possible to decide to roll back against a sync source that is behind the rollback node (due to receiving an empty batch), then resolve the common point when that same source is ahead. This leads to the rollback node crashing during oplog truncation, as there are no entries after the common point.



 Comments   
Comment by Githook User [ 10/Feb/20 ]

Author:

{'username': 'visualzhou', 'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com'}

Message: SERVER-34768 Sync source's optime cannot be behind the syncing node even if chaining is disabled.

(cherry picked from commit 319757ebb72611fb91044a2a81d1b77a6f3729c1)

SERVER-46050 Use getLastAppliedOpTime rather than getHeartbeatAppliedOpTime for checking primary's position.
Branch: v4.0
https://github.com/mongodb/mongo/commit/1191a063fd235df4ab23bb75e59eb1e530e2e93c

Comment by Githook User [ 07/Feb/20 ]

Author:

{'username': 'visualzhou', 'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com'}

Message: SERVER-34768 Sync source's optime cannot be behind the syncing node even if chaining is disabled.

(cherry picked from commit 319757ebb72611fb91044a2a81d1b77a6f3729c1)
Branch: v4.2
https://github.com/mongodb/mongo/commit/d94888c0d0a8065ca57d354ece33b3c2a1a5a6d6

Comment by Githook User [ 14/Jan/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-34768 Sync source's optime cannot be behind the syncing node even if chaining is disabled.
Branch: master
https://github.com/mongodb/mongo/commit/319757ebb72611fb91044a2a81d1b77a6f3729c1

Comment by Tess Avitabile (Inactive) [ 09/Jan/20 ]

Feel free to work on this on BF Friday.

Comment by Siyuan Zhou [ 15/Oct/19 ]

Re-opening this ticket since SERVER-33812 has been reverted. We need to address the root cause of the race.

To answer tess.avitabile's question above, Will pointed out in BF-14623:

It also could have been possible even before secondary reads were allowed during batch application. Even if the PBWM lock blocks all readers during batch application, it should be possible for a oplog reader on thread T1 to acquire the PBWM, read an empty batch, release the lock, and then a new batch is applied on a separate thread and advances the lastApplied optime before we append oplog query metadata for the reader on T1. This could produce the same issue i.e. we return an empty batch but with a lastApplied optime that is newer than our own lastApplied optime.

Comment by William Schultz (Inactive) [ 23/Jul/18 ]

Fixed by SERVER-33812.

Comment by Tess Avitabile (Inactive) [ 23/Jul/18 ]

That makes sense to me. Thank you for investigating. I think it's fine to close this ticket.

Comment by Judah Schvimer [ 01/May/18 ]

I think it is definitely a bug if a node chooses a sync source that is behind it. It doesn't look like the SyncSourceResolver checks that the sync source candidate is ahead of it, which means that if chaining is disallowed, nothing is preventing nodes from choosing a sync source behind them. If chaining is not disabled, the fact that we compare our lastAppliedOpTime to a potential candidate's (and not just the timestamp), likely prevents us from syncing from a node behind us. 

Comment by Spencer Brody (Inactive) [ 01/May/18 ]

We probably need to do more to make sure that we don't go into rollback against a node that's just behind us but not on a divergent branch of history.  It's possible this can only happen if chaining is disallowed.  We also should probably take a look at catchup_takeover_two_nodes_ahead.js and make sure there's no issue with the test.

Generated at Thu Feb 08 04:37:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.