Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-7952

killop of replication get more can cause replication operations to be skipped

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 2.4.0-rc1
    • Affects Version/s: 2.0.2, 2.3.2
    • Component/s: Replication
    • None
    • ALL
    • Hide

      Test:

      replTest = new ReplSetTest( { name:'test', nodes:3 } );
      nodes = replTest.startSet();
      replTest.initiate();
      master = replTest.getMaster();
      secondary = replTest.getSecondary();
      db = master.getDB( 'test' );
      db.test.save( { a:0 } );
      replTest.awaitReplication();
      
      // Start a parallel shell to insert new documents on the primary.
      inserter = startParallelShell( "for( i = 1; i < 1e5; ++i ) { db.test.save( { a:i } ); sleep( 1 ); }" );
      
      // Periodically kill replication get mores.
      for( i = 0; i < 1e3; ++i ) {
          ops = db.currentOp();
          printjson( ops );
          for( j in ops.inprog ) {
              op = ops.inprog[ j ];
              if ( op.ns == 'local.oplog.rs' && op.op == 'getmore' ) {
                  db.killOp( op.opid );
              }
          }
          sleep( 100 );
      }
      
      // Wait for the inserter to finish.
      inserter();
      
      function allReplicated() {
          count = secondary.getDB( 'test' ).test.count();
          if ( count == 1e5 ) {
              // Return true if the count is as expected.
              return true;
          }
      
          // Identify and print the missing a-values.
          found = {};
          c = secondary.getDB( 'test' ).test.find().sort( { $natural:1 } );
          while( c.hasNext() ) {
              found[ '' + c.next().a ] = true;
          }
          missing = [];
          for( i = 0; i < 1e5; ++i ) {
              if ( !( ( '' + i ) in found ) ) {
                  missing.push( i );
              }
          }
          print( 'count: ' + count + ' missing: ' + missing );
          return false;
      }
      
      // Wait for all documents to be replicated.
      assert.soon( allReplicated );
      
      
      Show
      Test: replTest = new ReplSetTest( { name: 'test' , nodes:3 } ); nodes = replTest.startSet(); replTest.initiate(); master = replTest.getMaster(); secondary = replTest.getSecondary(); db = master.getDB( 'test' ); db.test.save( { a:0 } ); replTest.awaitReplication(); // Start a parallel shell to insert new documents on the primary. inserter = startParallelShell( " for ( i = 1; i < 1e5; ++i ) { db.test.save( { a:i } ); sleep( 1 ); }" ); // Periodically kill replication get mores. for ( i = 0; i < 1e3; ++i ) { ops = db.currentOp(); printjson( ops ); for ( j in ops.inprog ) { op = ops.inprog[ j ]; if ( op.ns == 'local.oplog.rs' && op.op == 'getmore' ) { db.killOp( op.opid ); } } sleep( 100 ); } // Wait for the inserter to finish. inserter(); function allReplicated() { count = secondary.getDB( 'test' ).test.count(); if ( count == 1e5 ) { // Return true if the count is as expected. return true ; } // Identify and print the missing a-values. found = {}; c = secondary.getDB( 'test' ).test.find().sort( { $natural:1 } ); while ( c.hasNext() ) { found[ '' + c.next().a ] = true ; } missing = []; for ( i = 0; i < 1e5; ++i ) { if ( !( ( '' + i ) in found ) ) { missing.push( i ); } } print( 'count: ' + count + ' missing: ' + missing ); return false ; } // Wait for all documents to be replicated. assert .soon( allReplicated );

      Updated diagnosis (aaron):

      Several factors combine to allow a client killop of a replication get more operation to cause replication to skip operations sent to a secondary.

      1) get more
      Get more is used to retrieve batches of results from a client cursor. If an exception (such as a killop exception) is thrown during a get more request, an OP_REPLY <http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/#op-reply> is sent containing the following:

      • responseFlags == 0 (neither QueryFailure or CursorNotFound are set)
      • cursorID == 0
      • numberReturned == 0

      The client cursor this get more was issued against remains alive with its original cursor id, and is ready to return additional documents. When an error occurs and aborts a get more, the cursor the get more iterates over is left at its position as of the time the error occurred, and will continue iterating from that position on subsequent requests. Since a get more that errors out will return no results, any documents iterated over in a get more implementation before an error is thrown will never be sent to the client.

      2) DBClientCursor::more()
      This cursor method retrieves a new batch of results when the previous batch is exhausted. The implementation:

      • Sets its own internal flag based on responseFlags.QueryFailure but does nothing with this internal flag.
      • Ignores the value of cursorID unless responseFlags.CursorNotFound is set.
        Thus, after one get more request produces an error, DBClientCursor can continue iterating over the cursor on subsequent calls to more(). As described in part 1) this may result in skipping cursor results.

      3) replication
      The replication implementations (both replica sets and master slave) routinely call more() twice in a row, in particular for handling cases where a tailable cursor is exhausted at the first more() call but populated at a second more() call. By calling more() twice in a row, the possibility of skipping documents as described in part 2) can be triggered.

      I recommend:

      • On error, have get more kill its ClientCursor and return a clear indication of an error to the client (using QueryFailure flag if appropriate)
      • Have DBClientCursor::more() handle errors and validate all flags and attributes for unexpected values. It seems like throwing in these cases would be consistent with existing behavior.
      • Ensure that the replication code (including master/slave) handles such exceptions correctly.

      I don't know if we want to make all of these changes right now though.

      Also, a somewhat related issue with replication and get more is described here <https://jira.mongodb.org/browse/SERVER-4853>.

      -----------------------------------------------------

      If a user uses db.killOp() to kill a replication operation on a primary, the secondary might ultimately miss the operation.

            Assignee:
            aaron Aaron Staple
            Reporter:
            ron.avnur Ron Avnur
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: