Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-25618

Slave in master/slave deployment can acknowledge operations prior to applying them

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.3.12
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Fully Compatible
    • ALL
    • Hide
      python buildscripts/resmoke.py --executor=replication ms_initialsync.js
      
      ms_initialsync.js
      var rt = new ReplTest("ms_initsync");
      
      // XXX: The test only seems to fail when 'dbToCloneName' sorts alphabetically before the database
      // name we're doing the w=2 write on. This probably has something to do with ReplSource using a
      // std::set<std::string> to manage what databases it needs to clone and it deciding to clone the
      // 'dbToCloneName' database first instead.
      var dbToCloneName = "a_toclone";
      
      // Start the master and insert some data to ensure that the slave has to clone a database.
      var master = rt.start(true);
      master.setLogLevel(5, "query");
      assert.writeOK(master.getDB(dbToCloneName).mycoll.insert({a: 1}));
      assert.eq(1, master.getDB(dbToCloneName).mycoll.find({}).itcount());
      
      // Start the slave.
      var slave = rt.start(false);
      slave.setLogLevel(6, "replication");
      
      // Perform a w=2 write to ensure that master/slave replication is up and running.
      assert.writeOK(master.getDB("test").mycoll.insert({}, {writeConcern: {w: 2}}));
      assert.eq(1, slave.getDB("test").mycoll.find({}).itcount());
      
      Output
      [js_test:ms_initsync] 2016-08-15T15:47:00.975-0400 d20011| 2016-08-15T15:47:00.974-0400 I REPL     [replslave] resync: done with initial clone for db: a_toclone
      [js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.974-0400 D REPL     [replslave] Saving repl source: { host: "127.0.0.1:20010", source: "main", syncedTo: Timestamp 1471290419000|5, dbsNextPass: { test: true } }
      [js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.975-0400 D REPL     [replslave] first op time received: 57b21c33:5
      [js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.975-0400 D REPL     [replslave] initial run
      ...
      [js_test:ms_initsync] 2016-08-15T15:47:00.977-0400 d20010| 2016-08-15T15:47:00.975-0400 D QUERY    [conn2] Running getMore, cursorid: 19764869580
      [js_test:ms_initsync] 2016-08-15T15:47:00.977-0400 d20010| 2016-08-15T15:47:00.975-0400 I COMMAND  [conn1] command test.mycoll appName:MongoDB Shell command: insert { insert: "mycoll", documents: [ { _id: ObjectId('57b21c33ed2362d51d9bdd07') } ], ordered: true, writeConcern: { w: 2.0 } } ninserted:1 keysInserted:1 numYields:0 reslen:45 locks:{ Global: { acquireCount: { r: 5, w: 5 } }, Database: { acquireCount: { w: 4, W: 1 } }, Collection: { acquireCount: { w: 4 } }, Metadata: { acquireCount: { w: 2 } } } protocol:op_command 1040ms
      ...
      [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 2016-08-15T15:47:00.978-0400 E QUERY    [thread1] Error: [1] != [0] are not equal : undefined :
      [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 doassert@src/mongo/shell/assert.js:15:14
      [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 assert.eq@src/mongo/shell/assert.js:51:5
      [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 @ms_initsync.js:21:1
      [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 failed to load: ms_initsync.js
      
      Show
      python buildscripts/resmoke.py --executor=replication ms_initialsync.js ms_initialsync.js var rt = new ReplTest( "ms_initsync" ); // XXX: The test only seems to fail when 'dbToCloneName' sorts alphabetically before the database // name we're doing the w=2 write on. This probably has something to do with ReplSource using a // std::set<std::string> to manage what databases it needs to clone and it deciding to clone the // 'dbToCloneName' database first instead. var dbToCloneName = "a_toclone" ; // Start the master and insert some data to ensure that the slave has to clone a database. var master = rt.start( true ); master.setLogLevel(5, "query" ); assert.writeOK(master.getDB(dbToCloneName).mycoll.insert({a: 1})); assert.eq(1, master.getDB(dbToCloneName).mycoll.find({}).itcount()); // Start the slave. var slave = rt.start( false ); slave.setLogLevel(6, "replication" ); // Perform a w=2 write to ensure that master/slave replication is up and running. assert.writeOK(master.getDB( "test" ).mycoll.insert({}, {writeConcern: {w: 2}})); assert.eq(1, slave.getDB( "test" ).mycoll.find({}).itcount()); Output [js_test:ms_initsync] 2016-08-15T15:47:00.975-0400 d20011| 2016-08-15T15:47:00.974-0400 I REPL [replslave] resync: done with initial clone for db: a_toclone [js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.974-0400 D REPL [replslave] Saving repl source: { host: "127.0.0.1:20010", source: "main", syncedTo: Timestamp 1471290419000|5, dbsNextPass: { test: true } } [js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.975-0400 D REPL [replslave] first op time received: 57b21c33:5 [js_test:ms_initsync] 2016-08-15T15:47:00.976-0400 d20011| 2016-08-15T15:47:00.975-0400 D REPL [replslave] initial run ... [js_test:ms_initsync] 2016-08-15T15:47:00.977-0400 d20010| 2016-08-15T15:47:00.975-0400 D QUERY [conn2] Running getMore, cursorid: 19764869580 [js_test:ms_initsync] 2016-08-15T15:47:00.977-0400 d20010| 2016-08-15T15:47:00.975-0400 I COMMAND [conn1] command test.mycoll appName:MongoDB Shell command: insert { insert: "mycoll", documents: [ { _id: ObjectId('57b21c33ed2362d51d9bdd07') } ], ordered: true, writeConcern: { w: 2.0 } } ninserted:1 keysInserted:1 numYields:0 reslen:45 locks:{ Global: { acquireCount: { r: 5, w: 5 } }, Database: { acquireCount: { w: 4, W: 1 } }, Collection: { acquireCount: { w: 4 } }, Metadata: { acquireCount: { w: 2 } } } protocol:op_command 1040ms ... [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 2016-08-15T15:47:00.978-0400 E QUERY [thread1] Error: [1] != [0] are not equal : undefined : [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 doassert@src/mongo/shell/assert.js:15:14 [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 assert.eq@src/mongo/shell/assert.js:51:5 [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 @ms_initsync.js:21:1 [js_test:ms_initsync] 2016-08-15T15:47:00.979-0400 failed to load: ms_initsync.js
    • Repl 2016-08-29
    • 0

      The master determines the opTime that a slave has applied until based on the position of the slave's cursor. However, the slave will issue a getMore when no operations have been applied, which causes the master to believe the slave has advanced past the opTime in its initial query.

      // apply operations
      {
          int n = 0;
          time_t saveLast = time(0);
          while (1) {
              // we need "&& n" to assure we actually process at least one op to get a sync
              // point recorded in the first place.
              const bool moreInitialSyncsPending = !addDbNextPass.empty() && n;
              if (moreInitialSyncsPending || !oplogReader.more()) {
                  ...
              }
          ...
      }
      

      I can reproduce this issue on MongoDB 2.6+ and didn't attempt to run the "Steps to reproduce" against any versions earlier than that.

            Assignee:
            scotthernandez Scott Hernandez (Inactive)
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: