[SERVER-21037] Initial sync can miss documents if concurrent update results in error (mmapv1 only) Created: 20/Oct/15  Updated: 14/Apr/16  Resolved: 26/Oct/15

Status: Closed
Project: Core Server
Component/s: Querying, Replication, Storage, Write Ops
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: J Rassi Assignee: Geert Bosch
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-21058 need fail point to stress yielding be... Closed
is related to SERVER-21057 Collection scan during concurrent mov... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QuInt B (11/02/15)
Participants:

 Description   

It is possible for an initial sync collection clone to skip over documents that are concurrently updated, under certain circumstances. When this happens, the initial sync will report success, but the newly-synced member will be silently missing these documents.

The following conditions are required to trigger this scenario:

  • The sync source must be running with the mmapv1 storage engine.
  • When the collection scan query issued by the initial sync is yielding locks, an update must be issued against the document pointed to by the query's record cursor. This update must meet both of the following criteria:
    • The update must increase the size of the document, such that a document move is required.
    • The update must fail to generate an oplog entry (e.g. if the update fails with a duplicate key error).

With mmapv1, an update of a document generates an invalidation for all active cursors pointing to that document (as a result, those cursors are advanced). Documents that are updated in this manner during an initial sync are copied to the sync target during the "oplog replay" initial sync phase. However, the copy is not performed if the update does not generate an oplog entry, which causes the synced collection to be missing the document.

This is a regression introduced in the 3.0.x series of the server. In the 2.6.x series and prior, invalidations are not issued if the update would generate an error; this logic was removed with the introduction of the storage API in the 3.0.x series.

This issue can be reproduced with the following script:

var rst = new ReplSetTest({nodes: 2,
                           nodeOptions: {storageEngine: "mmapv1",
                                         setParameter: "internalQueryExecYieldIterations=2"}});
rst.startSet();
rst.initiate();
var primary = rst.getPrimary();
var secondary = rst.getSecondary();
assert.writeOK(primary.getDB("test").foo.insert([{_id: 0, a: 0}, {_id: 1, a: 1}, {_id: 2, a: 2}]));
assert.commandWorked(primary.getDB("test").foo.ensureIndex({a: 1}, {unique: true}));
rst.awaitReplication();
rst.stop(secondary);
startParallelShell(
    'while (true) { \
         db.foo.update({_id: 1}, {$set: {x: new Array(1024).join("x"), a: 2}}); \
         sleep(1000); \
     }', primary.port);
rst.start(secondary);
rst.waitForState(secondary, rst.SECONDARY, 60 * 1000);
reconnect(secondary.getDB("test"));
assert.eq(3, secondary.getDB("test").foo.count());

The assertion on the last line trips with the message "3 != 2", as the newly-synced member is missing the document {_id: 1, a: 1}.

The following patch to the server greatly increases reproducibility:

diff --git a/src/mongo/db/query/query_yield.cpp b/src/mongo/db/query/query_yield.cpp
index 4e0d463..7edde6e 100644
--- a/src/mongo/db/query/query_yield.cpp
+++ b/src/mongo/db/query/query_yield.cpp
@@ -62,6 +62,10 @@ void QueryYield::yieldAllLocks(OperationContext* txn, RecordFetcher* fetcher) {
     // locks). If we are yielding, we are at a safe place to do so.
     txn->recoveryUnit()->abandonSnapshot();
 
+    if (txn->getNS() == "test.foo") {
+        sleepmillis(2000);
+    }
+
     // Track the number of yields in CurOp.
     CurOp::get(txn)->yielded();

Reproduced with master (07168e08) and 3.0.7.



 Comments   
Comment by Geert Bosch [ 26/Oct/15 ]

This is fixed by SERVER-21057. An invalidation will now be undone if the operation causing it fails to successfully commit, so it never can be the case that a update or delete of a document can cause it to be skipped during initial sync without that operation appearing in the oplog.

Generated at Thu Feb 08 03:56:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.