Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28877

Cancel race in replication executor can cause elections to assert.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.4.3
    • Fix Version/s: 3.2.15, 3.4.6, 3.5.8
    • Component/s: Replication
    • Labels:
      None
    • Environment:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4, v3.2
    • Sprint:
      Repl 2017-05-29
    • Linked BF Score:
      51

      Description

      Output from mongodb.log:

      2017-04-19T13:15:23.807Z I COMMAND  [conn50682] command local.oplog.rs command: getMore { getMore: 16914237176, collection: "oplog.rs", maxTimeMS: 5000, term: 161, lastKnownCommittedOpTime: { ts: Timestamp 1492607700000|2, t: 161 } } originatingCommand: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1492156061000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 161 }
       planSummary: COLLSCAN cursorid:16914237176 keysExamined:0 docsExamined:0 numYields:1 nreturned:0 reslen:292 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_command 6625ms
      2017-04-19T13:15:46.612Z I COMMAND  [conn51155] command local.oplog.rs command: getMore { getMore: 16914237176, collection: "oplog.rs", maxTimeMS: 5000, term: 161, lastKnownCommittedOpTime: { ts: Timestamp 1492607722000|1, t: 161 } } originatingCommand: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1492156061000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 161 }
       planSummary: COLLSCAN cursorid:16914237176 keysExamined:0 docsExamined:0 numYields:0 nreturned:0 reslen:292 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, oplog: { acquireCount: { r: 2 } } } protocol:op_command 17700ms
      2017-04-19T13:15:46.614Z I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
      2017-04-19T13:15:46.614Z I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
      2017-04-19T13:15:46.614Z I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mdbfh:27017
      2017-04-19T13:15:46.614Z I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mdb1:27017
      2017-04-19T13:15:46.614Z I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mdbarb:27017
      2017-04-19T13:15:46.656Z I NETWORK  [conn51178] received client metadata from 37.120.167.74:33775 conn51178: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux",
      name: "Ubuntu", architecture: "x86_64", version: "14.04" } }
      2017-04-19T13:15:46.670Z I NETWORK  [conn51179] received client metadata from 84.39.96.207:38332 conn51179: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.3" }, os: { type: "Linux", n
      ame: "PRETTY_NAME="Debian GNU/Linux 8 (jessie)"", architecture: "x86_64", version: "Kernel 2.6.32-042stab120.18" } }
      2017-04-19T13:15:46.740Z I REPL     [ReplicationExecutor] Error in heartbeat request to mdb1:27017; ExceededTimeLimit: Remote command timed out while waiting to get a connection from the pool, took 12407ms, timeout was set to 10000ms
      2017-04-19T13:15:46.742Z I REPL     [ReplicationExecutor] Error in heartbeat request to mdbfh:27017; ExceededTimeLimit: Remote command timed out while waiting to get a connection from the pool, took 12532ms, timeout was set to 10000ms
      2017-04-19T13:15:46.742Z I REPL     [ReplicationExecutor] Error in heartbeat request to mdbarb:27017; ExceededTimeLimit: Remote command timed out while waiting to get a connection from the pool, took 12533ms, timeout was set to 10000ms
      2017-04-19T13:15:46.938Z I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mdb1:27017
      2017-04-19T13:15:46.942Z I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mdbfh:27017
      2017-04-19T13:15:46.942Z I REPL     [ReplicationExecutor] VoteRequester(term 161 dry run) received a no vote from mdb1:27017 with reason "candidate's data is staler than mine"; response message: { term: 161, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0 }
      2017-04-19T13:15:46.963Z I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mdbarb:27017
      2017-04-19T13:15:46.975Z I REPL     [ReplicationExecutor] VoteRequester(term 161 dry run) received a no vote from mdbarb:27017 with reason "can see a healthy primary of equal or greater priority"; response message: { term: 161, voteGranted: false, reason: "can see a healthy primary of equal or greater priority", ok: 1.0 }
      2017-04-19T13:15:46.996Z I -        [ReplicationExecutor] Invariant failure !_callbacks.empty() src/mongo/db/repl/scatter_gather_runner.cpp 149
      2017-04-19T13:15:46.997Z I -        [ReplicationExecutor]
       
      ***aborting after invariant() failure
       
      2017-04-19T13:15:47.344Z F -        [ReplicationExecutor] Got signal: 6 (Aborted).
       
       0x7f12b5fb2981 0x7f12b5fb1a79 0x7f12b5fb1f5d 0x7f12b362f330 0x7f12b328dc37 0x7f12b3291028 0x7f12b5243b9c 0x7f12b5a6b654 0x7f12b5a31ba1 0x7f12b5a2e719 0x7f12b5a36784 0x7f12b6a24510 0x7f12b3627184 0x7f12b3354bed
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"7F12B4A34000","o":"157E981","s":"_ZN5mongo15printStackTraceERSo"},{"b":"7F12B4A34000","o":"157DA79"},{"b":"7F12B4A34000","o":"157DF5D"},{"b":"7F12B361F000","o":"10330"},{"b":"7F12B3257000","o":"36C37","s":"gsignal"},{"b":"7F12B3257000","o":"3A028","s":"abort"},{"b":"7F12B4A34000","o":"80FB9C","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"7F12B4A34000","o":"1037654","s":"_ZN5mongo4repl19ScatterGatherRunner10RunnerImpl15processResponseERKNS_8executor12TaskExecutor25RemoteCommandCallbackArgsE"},{"b":"7F12B4A34000","o":"FFDBA1"},{"b":"7F12B4A34000","o":"FFA719"},{"b":"7F12B4A34000","o":"1002784","s":"_ZN5mongo4repl19ReplicationExecutor3runEv"},{"b":"7F12B4A34000","o":"1FF0510"},{"b":"7F12B361F000","o":"8184"},{"b":"7F12B3257000","o":"FDBED","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.3", "gitVersion" : "f07437fb5a6cca07c10bafa78365456eb1d6d5e1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-112-generic", "version" : "#159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "b" : "7F12B4A34000", "elfType" : 3, "buildId" : "CE26BD3A6C680DD87D137892B07C2EEFC1162144" }, { "b" : "7FFFC5A9B000", "elfType" : 3, "buildId" : "012E1338BA43AF7C0DC7D069F64F0A6490CC6D9C" }, { "b" : "7F12B45A7000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "58087C35580286708FF6469B51886E5FF5740101" }, { "b" : "7F12B4165000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "1C949DBD43FCE50C9B691B7EA8F31229891B2218" }, { "b" : "7F12B3F5D000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "1EEBA762A6A2C8884D56033EE8CCE79B95CD974D" }, { "b" : "7F12B3D59000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D0F881E59FF88BE4F29A228C8657376B3C325C2C" }, { "b" : "7F12B3A53000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1654CB13B1D24ED03F4BDCB51FC7524B9181A771" }, { "b" : "7F12B383D000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7F12B361F000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "22F9078CFA529CCE1A814A4A1A1C018F169D5652" }, { "b" : "7F12B3257000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "CA5C6CFE528AF541C3C2C15CEE4B3C74DA4E2FB4" }, { "b" : "7F12B480F000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "237E22E5AAC2DDFCD06518F63FD720FE758E6E5B" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7f12b5fb2981]
       mongod(+0x157DA79) [0x7f12b5fb1a79]
       mongod(+0x157DF5D) [0x7f12b5fb1f5d]
       libpthread.so.0(+0x10330) [0x7f12b362f330]
       libc.so.6(gsignal+0x37) [0x7f12b328dc37]
       libc.so.6(abort+0x148) [0x7f12b3291028]
       mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x7f12b5243b9c]
       mongod(_ZN5mongo4repl19ScatterGatherRunner10RunnerImpl15processResponseERKNS_8executor12TaskExecutor25RemoteCommandCallbackArgsE+0x254) [0x7f12b5a6b654]
       mongod(+0xFFDBA1) [0x7f12b5a31ba1]
       mongod(+0xFFA719) [0x7f12b5a2e719]
       mongod(_ZN5mongo4repl19ReplicationExecutor3runEv+0x274) [0x7f12b5a36784]
       mongod(+0x1FF0510) [0x7f12b6a24510]
       libpthread.so.0(+0x8184) [0x7f12b3627184]
       libc.so.6(clone+0x6D) [0x7f12b3354bed]
      -----  END BACKTRACE  -----
      

      My Replica Setup:
      4 Member:
      1 Arbiter with default prio 1
      3 "Worker" with prio 1, 2, 2

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: