[SERVER-35571] Wait until all nodes become stable before checkOplogs Created: 13/Jun/18  Updated: 29/Oct/23  Resolved: 19/Jun/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.0.1, 4.1.1

Type: Task Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Tess Avitabile (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0, v3.6
Sprint: Repl 2018-07-02
Participants:
Linked BF Score: 15

 Description   

checkOplogs calls awaitReplication() for live nodes before the oplog check, but _callIsMaster() is called again and resets the _liveNodes before the check. The live nodes could be different then. Waiting for all nodes to be in Down, Primary, Secondary or Arbiter state at the very beginning of checkOplogs is a possible solution.



 Comments   
Comment by Githook User [ 03/Jul/18 ]

Author:

{'username': 'tessavitabile', 'name': 'Tess Avitabile', 'email': 'tess.avitabile@mongodb.com'}

Message: SERVER-35571 checkReplicaSet should propagate liveSlaves to checkOplogs

(cherry picked from commit e7f212b876f8dc3e0b9aa740d55d97b781deb263)
Branch: v4.0
https://github.com/mongodb/mongo/commit/b8f2338752f5ed8ee8da184919b5f43ac0bed3eb

Comment by Githook User [ 19/Jun/18 ]

Author:

{'username': 'tessavitabile', 'name': 'Tess Avitabile', 'email': 'tess.avitabile@mongodb.com'}

Message: SERVER-35571 checkReplicaSet should propagate liveSlaves to checkOplogs
Branch: master
https://github.com/mongodb/mongo/commit/e7f212b876f8dc3e0b9aa740d55d97b781deb263

Comment by Max Hirschhorn [ 14/Jun/18 ]

spencer, in order to be able to run the dbhash check as part of ReplSetTest#stopSet() (see also SERVER-25640), we need to be able call ReplSetTest#checkReplicatedDataHashes() when a node may no longer be running.

Comment by Spencer Brody (Inactive) [ 13/Jun/18 ]

max.hirschhorn, do we run the repl set checkers when we expect nodes of the set to be down? If not, we could just call awaitReplication() with the full set of nodes in the replset, rather than just the livenodes. If we expect to have to run this when nodes are down, then we need to do something to more rigorously confirm the set of nodes that are currently alive.

Comment by Max Hirschhorn [ 13/Jun/18 ]

If we only wait in the checkOplogs() function, then isn't it possible if we ever decide to run the dbhash check before the oplog check that we'd run into the same problem?

Generated at Thu Feb 08 04:40:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.