Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21037

Initial sync can miss documents if concurrent update results in error (mmapv1 only)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      QuInt B (11/02/15)

      Description

      It is possible for an initial sync collection clone to skip over documents that are concurrently updated, under certain circumstances. When this happens, the initial sync will report success, but the newly-synced member will be silently missing these documents.

      The following conditions are required to trigger this scenario:

      • The sync source must be running with the mmapv1 storage engine.
      • When the collection scan query issued by the initial sync is yielding locks, an update must be issued against the document pointed to by the query's record cursor. This update must meet both of the following criteria:
        • The update must increase the size of the document, such that a document move is required.
        • The update must fail to generate an oplog entry (e.g. if the update fails with a duplicate key error).

      With mmapv1, an update of a document generates an invalidation for all active cursors pointing to that document (as a result, those cursors are advanced). Documents that are updated in this manner during an initial sync are copied to the sync target during the "oplog replay" initial sync phase. However, the copy is not performed if the update does not generate an oplog entry, which causes the synced collection to be missing the document.

      This is a regression introduced in the 3.0.x series of the server. In the 2.6.x series and prior, invalidations are not issued if the update would generate an error; this logic was removed with the introduction of the storage API in the 3.0.x series.

      This issue can be reproduced with the following script:

      var rst = new ReplSetTest({nodes: 2,
                                 nodeOptions: {storageEngine: "mmapv1",
                                               setParameter: "internalQueryExecYieldIterations=2"}});
      rst.startSet();
      rst.initiate();
      var primary = rst.getPrimary();
      var secondary = rst.getSecondary();
      assert.writeOK(primary.getDB("test").foo.insert([{_id: 0, a: 0}, {_id: 1, a: 1}, {_id: 2, a: 2}]));
      assert.commandWorked(primary.getDB("test").foo.ensureIndex({a: 1}, {unique: true}));
      rst.awaitReplication();
      rst.stop(secondary);
      startParallelShell(
          'while (true) { \
               db.foo.update({_id: 1}, {$set: {x: new Array(1024).join("x"), a: 2}}); \
               sleep(1000); \
           }', primary.port);
      rst.start(secondary);
      rst.waitForState(secondary, rst.SECONDARY, 60 * 1000);
      reconnect(secondary.getDB("test"));
      assert.eq(3, secondary.getDB("test").foo.count());
      

      The assertion on the last line trips with the message "3 != 2", as the newly-synced member is missing the document {_id: 1, a: 1}.

      The following patch to the server greatly increases reproducibility:

      diff --git a/src/mongo/db/query/query_yield.cpp b/src/mongo/db/query/query_yield.cpp
      index 4e0d463..7edde6e 100644
      --- a/src/mongo/db/query/query_yield.cpp
      +++ b/src/mongo/db/query/query_yield.cpp
      @@ -62,6 +62,10 @@ void QueryYield::yieldAllLocks(OperationContext* txn, RecordFetcher* fetcher) {
           // locks). If we are yielding, we are at a safe place to do so.
           txn->recoveryUnit()->abandonSnapshot();
       
      +    if (txn->getNS() == "test.foo") {
      +        sleepmillis(2000);
      +    }
      +
           // Track the number of yields in CurOp.
           CurOp::get(txn)->yielded();
      

      Reproduced with master (07168e08) and 3.0.7.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              geert.bosch Geert Bosch
              Reporter:
              rassi J Rassi
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: