[SERVER-9474] Members not caught up to minvalid can later inadvertently delay state change to SECONDARY Created: 25/Apr/13  Updated: 11/Jul/16  Resolved: 08/May/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.3
Fix Version/s: 2.5.0

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Operating System: ALL
Participants:

 Description   

If a member

  • has not caught up to minvalid, and
  • has any non-trivial amount of additional replication lag

then it can inadvertently delay its transition to SECONDARY until it has fully eliminated its replication lag. The expected behavior is for the member to transition to SECONDARY as soon as it catches up to minvalid.

The bug is caused by the interaction of two pieces of functionality in SyncTail::oplogApplication(): the batch size limit, and a timer that controls logic for attempting to transition to SECONDARY. The timer is reset when a new batch is started, and then the timer generates a once-per-second call to tryToGoLiveAsASecondary(); however, the first call happens only after the timer reaches t=1 second. This can delay the tryToGoLiveAsASecondary() call if batches take less than one second to process.

This can be triggered in the following example scenario:

  • A secondary is shut down while in the middle of processing an oplog batch (any time between the minvalid write and the oplog write)
  • The member is brought back into the replica set hours later

The member will call tryToGoLiveAsASecondary() when it first starts up, but the SECONDARY transition will not occur because the minvalid condition will fail (as expected). Then, it will start processing oplog entries. Since it has replication lag, it will fetch oplog entries as fast as possible, and take less than one second each time to hit the batch limit. It will thus never get to the next tryToGoLiveAsASecondary() call until it has fully caught up.

Note that if, in the above scenario, the secondary instead happened to have been shut down between batches, then it will transition to SECONDARY as soon as it is brought up (since the minvalid condition will succeed during the first tryToGoLiveAsASecondary() call).



 Comments   
Comment by auto [ 09/May/13 ]

Author:

{u'date': u'2013-05-09T15:03:52Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9474 SERVER-9421 fix replset tests with background index builds

Lock the DBWrite lock before stopping index builds.
Avoid touching non-PODs while looking at non-threadsafe CurOp structures.
Fix the way we exit the batch-builder loop for replsettests
(this was broken by SERVER-9474's commit)
Branch: master
https://github.com/mongodb/mongo/commit/7008c37d48aa8893a4b79abd2ecca9699d8ee7d0

Comment by auto [ 09/May/13 ]

Author:

{u'date': u'2013-05-09T15:03:52Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9474 SERVER-9421 fix replset tests with background index builds

Lock the DBWrite lock before stopping index builds.
Avoid touching non-PODs while looking at non-threadsafe CurOp structures.
Fix the way we exit the batch-builder loop for replsettests
(this was broken by SERVER-9474's commit)
Branch: master
https://github.com/mongodb/mongo/commit/7008c37d48aa8893a4b79abd2ecca9699d8ee7d0

Comment by auto [ 08/May/13 ]

Author:

{u'date': u'2013-05-08T16:59:51Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9474 always run through the entire batch-building loop at least once per batch

Because we now run through the loop once with an empty op queue, we will always
do the stuff in "occasionally check some things" at least once per batch. This
fixes SERVER-9474, where if you had a steady stream of writes, your node in
RECOVERING would never transition to SECONDARY until you caught up completely.
Branch: master
https://github.com/mongodb/mongo/commit/25f2dfea762ebbaca65bd30d86429699b8a5da85

Comment by auto [ 08/May/13 ]

Author:

{u'date': u'2013-05-08T16:59:51Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9474 always run through the entire batch-building loop at least once per batch

Because we now run through the loop once with an empty op queue, we will always
do the stuff in "occasionally check some things" at least once per batch. This
fixes SERVER-9474, where if you had a steady stream of writes, your node in
RECOVERING would never transition to SECONDARY until you caught up completely.
Branch: master
https://github.com/mongodb/mongo/commit/25f2dfea762ebbaca65bd30d86429699b8a5da85

Generated at Thu Feb 08 03:20:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.