[SERVER-25618] Slave in master/slave deployment can acknowledge operations prior to applying them Created: 15/Aug/16  Updated: 15/Nov/16  Resolved: 26/Aug/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.3.12

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Scott Hernandez (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-27054 Slave can acknowledge write before in... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

python buildscripts/resmoke.py --executor=replication ms_initialsync.js

ms_initialsync.js

var rt = new ReplTest("ms_initsync");
 
// XXX: The test only seems to fail when 'dbToCloneName' sorts alphabetically before the database
// name we're doing the w=2 write on. This probably has something to do with ReplSource using a
// std::set<std::string> to manage what databases it needs to clone and it deciding to clone the
// 'dbToCloneName' database first instead.
var dbToCloneName = "a_toclone";
 
// Start the master and insert some data to ensure that the slave has to clone a database.
var master = rt.start(true);
master.setLogLevel(5, "query");
assert.writeOK(master.getDB(dbToCloneName).mycoll.insert({a: 1}));
assert.eq(1, master.getDB(dbToCloneName).mycoll.find({}).itcount());
 
// Start the slave.
var slave = rt.start(false);
slave.setLogLevel(6, "replication");
 
// Perform a w=2 write to ensure that master/slave replication is up and running.
assert.writeOK(master.getDB("test").mycoll.insert({}, {writeConcern: {w: 2}}));
assert.eq(1, slave.getDB("test").mycoll.find({}).itcount());

Output

[js_test:ms_initsync] 2016-08-15T15:47:00.975-0400 d20011| 2016-08-15T15:47:00.974-0400 I REPL     [replslave] resync: done with initial clone for db: a_toclone
[js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.974-0400 D REPL     [replslave] Saving repl source: { host: "127.0.0.1:20010", source: "main", syncedTo: Timestamp 1471290419000|5, dbsNextPass: { test: true } }
[js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.975-0400 D REPL     [replslave] first op time received: 57b21c33:5
[js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.975-0400 D REPL     [replslave] initial run
...
[js_test:ms_initsync] 2016-08-15T15:47:00.977-0400 d20010| 2016-08-15T15:47:00.975-0400 D QUERY    [conn2] Running getMore, cursorid: 19764869580
[js_test:ms_initsync] 2016-08-15T15:47:00.977-0400 d20010| 2016-08-15T15:47:00.975-0400 I COMMAND  [conn1] command test.mycoll appName:MongoDB Shell command: insert { insert: "mycoll", documents: [ { _id: ObjectId('57b21c33ed2362d51d9bdd07') } ], ordered: true, writeConcern: { w: 2.0 } } ninserted:1 keysInserted:1 numYields:0 reslen:45 locks:{ Global: { acquireCount: { r: 5, w: 5 } }, Database: { acquireCount: { w: 4, W: 1 } }, Collection: { acquireCount: { w: 4 } }, Metadata: { acquireCount: { w: 2 } } } protocol:op_command 1040ms
...
[js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 2016-08-15T15:47:00.978-0400 E QUERY    [thread1] Error: [1] != [0] are not equal : undefined :
[js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 doassert@src/mongo/shell/assert.js:15:14
[js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 assert.eq@src/mongo/shell/assert.js:51:5
[js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 @ms_initsync.js:21:1
[js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 failed to load: ms_initsync.js

Sprint: Repl 2016-08-29
Participants:
Linked BF Score: 0

 Description   

The master determines the opTime that a slave has applied until based on the position of the slave's cursor. However, the slave will issue a getMore when no operations have been applied, which causes the master to believe the slave has advanced past the opTime in its initial query.

// apply operations
{
    int n = 0;
    time_t saveLast = time(0);
    while (1) {
        // we need "&& n" to assure we actually process at least one op to get a sync
        // point recorded in the first place.
        const bool moreInitialSyncsPending = !addDbNextPass.empty() && n;
        if (moreInitialSyncsPending || !oplogReader.more()) {
            ...
        }
    ...
}

I can reproduce this issue on MongoDB 2.6+ and didn't attempt to run the "Steps to reproduce" against any versions earlier than that.



 Comments   
Comment by Githook User [ 26/Aug/16 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-25618: do not report progress during m/s initial sync
Branch: master
https://github.com/mongodb/mongo/commit/65925c449db6150307b1b0caf183b2988a692d77

Comment by Max Hirschhorn [ 15/Aug/16 ]

This issue came up during the work on SERVER-25155 because the admin.system.version collection will always be non-empty when starting up the master with a clean dbpath. This causes tests like jstests/repl/block1.js to fail.

Generated at Thu Feb 08 04:09:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.