[SERVER-8070] Flush buffer before changing sync targets to prevent unnecessary rollbacks Created: 03/Jan/13  Updated: 22/Sep/17  Resolved: 18/Jan/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.2.2
Fix Version/s: 2.4.0-rc0, 2.5.0

Type: Bug Priority: Major - P3
Reporter: Kristina Chodorow (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Participants:
Case:
Linked BF Score: 0

 Description   

The problem is with bgsync.cpp here:

if (theirTS < _lastOpTimeFetched) {
    log() << "replSet we are ahead of the primary, will try to roll back" << rsLog;
    theReplSet->syncRollback(r);
    return true;
}

If we're syncing from the primary, then this logic is valid. If we are not, then this member must have applied some ops from the buffer and got ahead of the sync source, which is okay and we don't want to rollback. Probably the right fix is to make sure the buffer is empty before choosing a new sync target.

This is a fairly annoying bug to run into. If the member is in the middle of applying a batch of ops, the rollback will fail (because you're not at minvalid) and put the member into FATAL state. (I think restarting should fix it, though.)



 Comments   
Comment by auto [ 23/May/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-8070 fix comparision to be correct
Branch: v2.4
https://github.com/mongodb/mongo/commit/338bce1a36af8cb220afb4c20363505ac7fa058d

Comment by auto [ 22/May/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-8070 more reliable test (failpoint was not in the right place)
Branch: v2.4
https://github.com/mongodb/mongo/commit/c88286c3695eeadb237a7d82c86d738fe2261843

Comment by auto [ 10/May/13 ]

Author:

{u'date': u'2013-05-10T20:09:18Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-8070 fix comparision to be correct
Branch: master
https://github.com/mongodb/mongo/commit/c7469639d98b26ac8376bac7662d717b5b0129b0

Comment by auto [ 10/May/13 ]

Author:

{u'date': u'2013-05-10T15:57:27Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-8070 more reliable test (failpoint was not in the right place)
Branch: master
https://github.com/mongodb/mongo/commit/c4f5c81aa05abe1cddc0ed5b7a3912a7ebbc048f

Comment by auto [ 28/Jan/13 ]

Author:

{u'date': u'2013-01-28T14:20:52Z', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-8070 Make timeout longer for slow buildbot
Branch: master
https://github.com/mongodb/mongo/commit/b7763e7355337502e6ed2ddd9e536646b0323aa3

Comment by auto [ 15/Jan/13 ]

Author:

{u'date': u'2013-01-15T20:59:43Z', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-8070 Make sure buffer is drained before choosing sync target
Branch: master
https://github.com/mongodb/mongo/commit/f380d40a3da93c22e9eb95d5308855f377a4123f

Comment by auto [ 15/Jan/13 ]

Author:

{u'date': u'2013-01-15T19:21:52Z', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-8070 Notify primary after each batch, not after each op
Branch: master
https://github.com/mongodb/mongo/commit/db62658b202c3570a5f7e091cb5bb1ee0437adc8

Generated at Thu Feb 08 03:16:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.