-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: 2.0.2, 2.3.2
-
Component/s: Replication
-
None
-
ALL
-
Updated diagnosis (aaron):
Several factors combine to allow a client killop of a replication get more operation to cause replication to skip operations sent to a secondary.
1) get more
Get more is used to retrieve batches of results from a client cursor. If an exception (such as a killop exception) is thrown during a get more request, an OP_REPLY <http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/#op-reply> is sent containing the following:
- responseFlags == 0 (neither QueryFailure or CursorNotFound are set)
- cursorID == 0
- numberReturned == 0
The client cursor this get more was issued against remains alive with its original cursor id, and is ready to return additional documents. When an error occurs and aborts a get more, the cursor the get more iterates over is left at its position as of the time the error occurred, and will continue iterating from that position on subsequent requests. Since a get more that errors out will return no results, any documents iterated over in a get more implementation before an error is thrown will never be sent to the client.
2) DBClientCursor::more()
This cursor method retrieves a new batch of results when the previous batch is exhausted. The implementation:
- Sets its own internal flag based on responseFlags.QueryFailure but does nothing with this internal flag.
- Ignores the value of cursorID unless responseFlags.CursorNotFound is set.
Thus, after one get more request produces an error, DBClientCursor can continue iterating over the cursor on subsequent calls to more(). As described in part 1) this may result in skipping cursor results.
3) replication
The replication implementations (both replica sets and master slave) routinely call more() twice in a row, in particular for handling cases where a tailable cursor is exhausted at the first more() call but populated at a second more() call. By calling more() twice in a row, the possibility of skipping documents as described in part 2) can be triggered.
I recommend:
- On error, have get more kill its ClientCursor and return a clear indication of an error to the client (using QueryFailure flag if appropriate)
- Have DBClientCursor::more() handle errors and validate all flags and attributes for unexpected values. It seems like throwing in these cases would be consistent with existing behavior.
- Ensure that the replication code (including master/slave) handles such exceptions correctly.
I don't know if we want to make all of these changes right now though.
Also, a somewhat related issue with replication and get more is described here <https://jira.mongodb.org/browse/SERVER-4853>.
-----------------------------------------------------
If a user uses db.killOp() to kill a replication operation on a primary, the secondary might ultimately miss the operation.