Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: 2.4.0-rc1
Affects Version/s: 2.0.2, 2.3.2
Component/s: Replication
Labels:
None

Operating System:
ALL

Steps To Reproduce:

Hide

Test:

replTest = new ReplSetTest( { name:'test', nodes:3 } );
nodes = replTest.startSet();
replTest.initiate();
master = replTest.getMaster();
secondary = replTest.getSecondary();
db = master.getDB( 'test' );
db.test.save( { a:0 } );
replTest.awaitReplication();

// Start a parallel shell to insert new documents on the primary.
inserter = startParallelShell( "for( i = 1; i < 1e5; ++i ) { db.test.save( { a:i } ); sleep( 1 ); }" );

// Periodically kill replication get mores.
for( i = 0; i < 1e3; ++i ) {
    ops = db.currentOp();
    printjson( ops );
    for( j in ops.inprog ) {
        op = ops.inprog[ j ];
        if ( op.ns == 'local.oplog.rs' && op.op == 'getmore' ) {
            db.killOp( op.opid );
        }
    }
    sleep( 100 );
}

// Wait for the inserter to finish.
inserter();

function allReplicated() {
    count = secondary.getDB( 'test' ).test.count();
    if ( count == 1e5 ) {
        // Return true if the count is as expected.
        return true;
    }

    // Identify and print the missing a-values.
    found = {};
    c = secondary.getDB( 'test' ).test.find().sort( { $natural:1 } );
    while( c.hasNext() ) {
        found[ '' + c.next().a ] = true;
    }
    missing = [];
    for( i = 0; i < 1e5; ++i ) {
        if ( !( ( '' + i ) in found ) ) {
            missing.push( i );
        }
    }
    print( 'count: ' + count + ' missing: ' + missing );
    return false;
}

// Wait for all documents to be replicated.
assert.soon( allReplicated );

Show

Test: replTest = new ReplSetTest( { name: 'test' , nodes:3 } ); nodes = replTest.startSet(); replTest.initiate(); master = replTest.getMaster(); secondary = replTest.getSecondary(); db = master.getDB( 'test' ); db.test.save( { a:0 } ); replTest.awaitReplication(); // Start a parallel shell to insert new documents on the primary. inserter = startParallelShell( " for ( i = 1; i < 1e5; ++i ) { db.test.save( { a:i } ); sleep( 1 ); }" ); // Periodically kill replication get mores. for ( i = 0; i < 1e3; ++i ) { ops = db.currentOp(); printjson( ops ); for ( j in ops.inprog ) { op = ops.inprog[ j ]; if ( op.ns == 'local.oplog.rs' && op.op == 'getmore' ) { db.killOp( op.opid ); } } sleep( 100 ); } // Wait for the inserter to finish. inserter(); function allReplicated() { count = secondary.getDB( 'test' ).test.count(); if ( count == 1e5 ) { // Return true if the count is as expected. return true ; } // Identify and print the missing a-values. found = {}; c = secondary.getDB( 'test' ).test.find().sort( { $natural:1 } ); while ( c.hasNext() ) { found[ '' + c.next().a ] = true ; } missing = []; for ( i = 0; i < 1e5; ++i ) { if ( !( ( '' + i ) in found ) ) { missing.push( i ); } } print( 'count: ' + count + ' missing: ' + missing ); return false ; } // Wait for all documents to be replicated. assert .soon( allReplicated );

Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Updated diagnosis (aaron):

Several factors combine to allow a client killop of a replication get more operation to cause replication to skip operations sent to a secondary.

1) get more
Get more is used to retrieve batches of results from a client cursor. If an exception (such as a killop exception) is thrown during a get more request, an OP_REPLY <http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/#op-reply> is sent containing the following:

responseFlags == 0 (neither QueryFailure or CursorNotFound are set)
cursorID == 0
numberReturned == 0

The client cursor this get more was issued against remains alive with its original cursor id, and is ready to return additional documents. When an error occurs and aborts a get more, the cursor the get more iterates over is left at its position as of the time the error occurred, and will continue iterating from that position on subsequent requests. Since a get more that errors out will return no results, any documents iterated over in a get more implementation before an error is thrown will never be sent to the client.

2) DBClientCursor::more()
This cursor method retrieves a new batch of results when the previous batch is exhausted. The implementation:

Sets its own internal flag based on responseFlags.QueryFailure but does nothing with this internal flag.
Ignores the value of cursorID unless responseFlags.CursorNotFound is set.
Thus, after one get more request produces an error, DBClientCursor can continue iterating over the cursor on subsequent calls to more(). As described in part 1) this may result in skipping cursor results.

3) replication
The replication implementations (both replica sets and master slave) routinely call more() twice in a row, in particular for handling cases where a tailable cursor is exhausted at the first more() call but populated at a second more() call. By calling more() twice in a row, the possibility of skipping documents as described in part 2) can be triggered.

I recommend:

On error, have get more kill its ClientCursor and return a clear indication of an error to the client (using QueryFailure flag if appropriate)
Have DBClientCursor::more() handle errors and validate all flags and attributes for unexpected values. It seems like throwing in these cases would be consistent with existing behavior.
Ensure that the replication code (including master/slave) handles such exceptions correctly.

I don't know if we want to make all of these changes right now though.

Also, a somewhat related issue with replication and get more is described here <https://jira.mongodb.org/browse/SERVER-4853>.

-----------------------------------------------------

If a user uses db.killOp() to kill a replication operation on a primary, the secondary might ultimately miss the operation.

Assignee:: Aaron Staple (Inactive)
Reporter:: Ron Avnur (Inactive)
Participants:: Aaron Staple, auto, Dwight Merriman, Eliot Horowitz, Ron Avnur
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Dec 16 2012 05:09:02 AM UTC
Updated:: Jul 11 2016 05:58:17 PM UTC
Resolved:: Feb 10 2013 10:15:18 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates