Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30049

applyOperation_inlock() allows exceptions from Collection::insertDocument() to percolate to caller

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.4.4
    • Fix Version/s: 3.2.17, 3.4.7, 3.5.11
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      ops = [];
      for (var i = 0; i < 10; ++i) {
          ops.push({op: "i", o: {_id: i}, ts: Timestamp(), v: 2, ns: "test.test"});
      }
      ops.push({ns: "test.$cmd", op: "c", o: {applyOps: []}});
      db.test.drop();
      db.test.insert({});
      db.adminCommand({setParameter: 1, traceWriteConflictExceptions: true});
      db.adminCommand({configureFailPoint: 'WTWriteConflictException', mode: {activationProbability: 0.05}});
      db.adminCommand({applyOps: ops})
      

      Show
      ops = []; for (var i = 0; i < 10; ++i) { ops.push({op: "i", o: {_id: i}, ts: Timestamp(), v: 2, ns: "test.test"}); } ops.push({ns: "test.$cmd", op: "c", o: {applyOps: []}}); db.test.drop(); db.test.insert({}); db.adminCommand({setParameter: 1, traceWriteConflictExceptions: true}); db.adminCommand({configureFailPoint: 'WTWriteConflictException', mode: {activationProbability: 0.05}}); db.adminCommand({applyOps: ops})
    • Sprint:
      Repl 2017-07-31
    • Case:

      Description

      This bug affects 3.4 and although some code has changed in master, I believe it still exists in the same way. This ticket will reference the 3.4.4 code.

      The bug is in the contract between a call from _applyOps to applyOperation_inlock for a non-upserting insert. If applyOperation_inlock is wrong, the return for other operations may need to be audited.

      Consider a WriteConflictException (WCE) thrown in the collection->insert call here: https://github.com/mongodb/mongo/blob/r3.4.4/src/mongo/db/repl/oplog.cpp#L846-L861

      In this case applyOperation_inlock is converting the WCE (which extends from DBException) into a Status with the appropriate code of 112 (WCE).

      However the caller from _applyOps is anticipating WCE's to be thrown as it's wrapped in a WCE retry loop:
      https://github.com/mongodb/mongo/blob/r3.4.4/src/mongo/db/catalog/apply_ops.cpp#L167-L177

      The status with error code 112 increments the errors counter:
      https://github.com/mongodb/mongo/blob/r3.4.4/src/mongo/db/catalog/apply_ops.cpp#L194-L197

      Which results in the applyOps command returning ErrorCode: 8, UnknownError:
      https://github.com/mongodb/mongo/blob/r3.4.4/src/mongo/db/catalog/apply_ops.cpp#L252-L254

      As far as I know, engaging the retry loop in _applyOps would be correct behavior.

      For interested parties, WCE's can (infrequently) be thrown on reads/writes other than concurrent access to the same document. These causes are usually memory pressure related.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                17 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: