Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18566

Primary member can trip fatal assertion if stepping down while running findAndModify op resulting in an upsert

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.0.4
    • Affects Version/s: 3.0.0
    • Component/s: Replication, Write Ops
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Quint Iteration 4

      A primary node in a replica set can trip the fatal assertion "logOp() but can't accept write to collection" (#17405) if it steps down while a findAndModify operation that results in an upsert is simultaneously running. This is a regression introduced in 3.0.0-rc9 by 30d9e174 (SERVER-17303).

      This issue has been fixed on the master branch by SERVER-16063, but releases on the v3.0 branch are still affected.

      Apply this patch to increase reproducibility:

      diff --git a/src/mongo/db/query/query_yield.cpp b/src/mongo/db/query/query_yield.cpp
      index 2e74682..7619871 100644
      --- a/src/mongo/db/query/query_yield.cpp
      +++ b/src/mongo/db/query/query_yield.cpp
      @@ -56,6 +56,8 @@ namespace mongo {
                   return;
               }
      
      +        sleepsecs(1);
      +
               // Top-level locks are freed, release any potential low-level (storage engine-specific
               // locks). If we are yielding, we are at a safe place to do so.
               txn->recoveryUnit()->commitAndRestart();
      

      Then, reproduce with following shell script:

      var rst = new ReplSetTest({nodes: 1,
                                 nodeOptions: {verbose: true,
                                               setParameter: 'internalQueryExecYieldIterations=2'}});
      rst.startSet();
      rst.initiate();
      var p = rst.getPrimary();
      var d = p.getDB("test");
      d.dropDatabase();
      d.createCollection("foo");
      s = startParallelShell(
          'db.foo.runCommand("findAndModify", {query: {a: 1}, update: {b: 1}, upsert: true});',
          rst.getPort(p)
      );
      sleep(1000);
      try {
          d.adminCommand({replSetStepDown: 2, force: true});
      }
      catch (e) { }
      s();
      

      The server generates the following output and shuts down:

       m31000| 2015-05-19T15:10:38.825-0400 I COMMAND  [conn1] Attempting to step down in response to replSetStepDown command
       m31000| 2015-05-19T15:10:38.825-0400 D QUERY    [conn1] received interrupt request for unknown op: 30
       m31000| 2015-05-19T15:10:38.825-0400 I REPL     [ReplicationExecutor] transition to SECONDARY
      2015-05-19T15:10:38.826-0400 I NETWORK  DBClientCursor::init call() failed
      sh6940| 2015-05-19T15:10:38.826-0400 DBClientCursor::init call() failed
       m31000| 2015-05-19T15:10:38.826-0400 I COMMAND  [conn1] command admin.$cmd command: replSetStepDown { replSetStepDown: 2.0, force: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:37 locks:{ Global: { acquireCount: { R: 1 } } } 0ms
       m31000| 2015-05-19T15:10:38.826-0400 D NETWORK  [conn1] Socket say send() errno:9 Bad file descriptor 127.0.0.1:53127
       m31000| 2015-05-19T15:10:38.826-0400 I NETWORK  [conn1] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:53127]
      sh6940| 2015-05-19T15:10:38.827-0400 Error: error doing query: failed at src/mongo/shell/query.js:81
       m31000| 2015-05-19T15:10:38.830-0400 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
       m31000| 2015-05-19T15:10:38.831-0400 D REPL     [ReplExecNetThread-0] thread shutting down
       m31000| 2015-05-19T15:10:38.831-0400 I REPL     [signalProcessingThread] Stopping replication applier threads
       m31000| 2015-05-19T15:10:38.883-0400 F REPL     [conn3] replSet error : logOp() but can't accept write to collection test.foo
       m31000| 2015-05-19T15:10:38.883-0400 I -        [conn3] Fatal Assertion 17405
       m31000| 2015-05-19T15:10:38.908-0400 I CONTROL  [conn3]
       m31000|  0x19b23b6 0x1964065 0x194f54f 0x16747c3 0x167547a 0x13352ac 0x1333595 0x13b7f91 0x13b8f0e 0x13b97ed 0x15c0fac 0x15c301e 0x14c680c 0x14c79b1 0x11b6f94 0x197875e 0x7fc6a4443182 0x7fc6a354430d
       m31000| ----- BEGIN BACKTRACE -----
       m31000| {"backtrace":[{"b":"400000","o":"15B23B6"},{"b":"400000","o":"1564065"},{"b":"400000","o":"154F54F"},{"b":"400000","o":"12747C3"},{"b":"400000","o":"127547A"},{"b":"400000","o":"F352AC"},{"b":"400000","o":"F33595"},{"b":"400000","o":"FB7F91"},{"b":"400000","o":"FB8F0E"},{"b":"400000","o":"FB97ED"},{"b":"400000","o":"11C0FAC"},{"b":"400000","o":"11C301E"},{"b":"400000","o":"10C680C"},{"b":"400000","o":"10C79B1"},{"b":"400000","o":"DB6F94"},{"b":"400000","o":"157875E"},{"b":"7FC6A443B000","o":"8182"},{"b":"7FC6A3449000","o":"FB30D"}],"processInfo":{ "mongodbVersion" : "3.0.2", "gitVersion" : "6201872043ecbbc0a4cc169b5482dcf385fc464f", "uname" : { "sysname" : "Linux", "release" : "3.13.0-24-generic", "version" : "#46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "15D39FB29B642DB60E65B20320FF6B05AF5217A8" }, { "b" : "7FFFD8AB7000", "elfType" : 3, "buildId" : "6755FAD2CADACDF1667E5B57FF1EDFC28DD1C976" }, { "b" : "7FC6A443B000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "FE662C4D7B14EE804E0C1902FB55218A106BC5CB" }, { "b" : "7FC6A4233000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FC6A402F000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FC6A3D2B000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7FC6A3A25000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "574C6350381DA194C00FF555E0C1784618C05569" }, { "b" : "7FC6A380F000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "CC0D578C2E0D86237CA7B0CE8913261C506A629A" }, { "b" : "7FC6A3449000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B571F83A8A6F5BB22D3558CDDDA9F943A2A67FD1" }, { "b" : "7FC6A4659000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
       m31000|  mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x19b23b6]
       m31000|  mongod(_ZN5mongo10logContextEPKc+0x71) [0x1964065]
       m31000|  mongod(_ZN5mongo13fassertFailedEi+0x60) [0x194f54f]
       m31000|  mongod(+0x12747C3) [0x16747c3]
       m31000|  mongod(_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_Pbb+0x96) [0x167547a]
       m31000|  mongod(_ZN5mongo16CmdFindAndModify7runImplEPNS_16OperationContextERKSsS4_RKNS_7BSONObjES7_S7_S7_bbbRNS_14BSONObjBuilderERSs+0x129E) [0x13352ac]
       m31000|  mongod(_ZN5mongo16CmdFindAndModify3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x38D) [0x1333595]
       m31000|  mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x96) [0x13b7f91]
       m31000|  mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xB9E) [0x13b8f0e]
       m31000|  mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x491) [0x13b97ed]
       m31000|  mongod(+0x11C0FAC) [0x15c0fac]
       m31000|  mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x37D) [0x15c301e]
       m31000|  mongod(+0x10C680C) [0x14c680c]
       m31000|  mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x4F4) [0x14c79b1]
       m31000|  mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xFC) [0x11b6f94]
       m31000|  mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x417) [0x197875e]
       m31000|  libpthread.so.0(+0x8182) [0x7fc6a4443182]
       m31000|  libc.so.6(clone+0x6D) [0x7fc6a354430d]
       m31000| -----  END BACKTRACE  -----
       m31000| 2015-05-19T15:10:38.908-0400 I -        [conn3]
       m31000|
       m31000| ***aborting after fassert() failure
       m31000|
       m31000|
      

            Assignee:
            rassi J Rassi
            Reporter:
            rassi J Rassi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: