[SERVER-9474] Members not caught up to minvalid can later inadvertently delay state change to SECONDARY Created: 25/Apr/13 Updated: 11/Jul/16 Resolved: 08/May/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.3 |
| Fix Version/s: | 2.5.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | J Rassi | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Description |
|
If a member
then it can inadvertently delay its transition to SECONDARY until it has fully eliminated its replication lag. The expected behavior is for the member to transition to SECONDARY as soon as it catches up to minvalid. The bug is caused by the interaction of two pieces of functionality in SyncTail::oplogApplication(): the batch size limit, and a timer that controls logic for attempting to transition to SECONDARY. The timer is reset when a new batch is started, and then the timer generates a once-per-second call to tryToGoLiveAsASecondary(); however, the first call happens only after the timer reaches t=1 second. This can delay the tryToGoLiveAsASecondary() call if batches take less than one second to process. This can be triggered in the following example scenario:
The member will call tryToGoLiveAsASecondary() when it first starts up, but the SECONDARY transition will not occur because the minvalid condition will fail (as expected). Then, it will start processing oplog entries. Since it has replication lag, it will fetch oplog entries as fast as possible, and take less than one second each time to hit the batch limit. It will thus never get to the next tryToGoLiveAsASecondary() call until it has fully caught up. Note that if, in the above scenario, the secondary instead happened to have been shut down between batches, then it will transition to SECONDARY as soon as it is brought up (since the minvalid condition will succeed during the first tryToGoLiveAsASecondary() call). |
| Comments |
| Comment by auto [ 09/May/13 ] |
|
Author: {u'date': u'2013-05-09T15:03:52Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: Lock the DBWrite lock before stopping index builds. |
| Comment by auto [ 09/May/13 ] |
|
Author: {u'date': u'2013-05-09T15:03:52Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: Lock the DBWrite lock before stopping index builds. |
| Comment by auto [ 08/May/13 ] |
|
Author: {u'date': u'2013-05-08T16:59:51Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: Because we now run through the loop once with an empty op queue, we will always |
| Comment by auto [ 08/May/13 ] |
|
Author: {u'date': u'2013-05-08T16:59:51Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: Because we now run through the loop once with an empty op queue, we will always |