Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-15972

Replication state sometimes transitions from PRIMARY to SECONDARY while writes are in progress

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 2.8.0-rc0
    • Affects Version/s: 2.7.8
    • Component/s: Replication
    • Labels:
      None
    • ALL
    • 0

      When the topology coordinator chooses to step down in response to a heartbeat, it changes its own state immediately. The replication coordinator then schedules a callback to change its own state under the exclusive lock. However, it is possible that before this callback executes, another callback runs that updates the replication executor's state, but without the exclusive lock. This is an error. A reasonable solution would be for the topology coordinator to enter a "going to step down" state, but defer actual stepdown until the replication coordinator can be updated synchronously.

      Original description of symptoms follows.

      this fassert appears to have happened before:

      TASK HISTORY (load older entries to see previous failures)

      LOGS

      m31000| 2014-10-22T00:49:27.419+0000 I COMMANDS [conn1] insert fgIndexSec.jstests_fgsec query: { _id: ObjectId('5446ff0445925d0f7bf0b56f'), i: 297862.0 } ninserted:1 keyUpdates:0 numYields:0  235ms
       m31000| 2014-10-22T00:49:27.422+0000 I REPLSETS Standing for election
       m31000| 2014-10-22T00:49:27.423+0000 I REPLSETS replSet info electSelf
       m31000| 2014-10-22T00:49:27.423+0000 I REPLSETS replSet election succeeded, assuming primary role
       m31002| 2014-10-22T00:49:27.423+0000 I REPLSETS [conn3] replSet received elect msg { replSetElect: 1, set: "fgIndex", who: "ip-10-123-133-157:31000", whoid: 0, cfgver: 1, round: ObjectId('5446ff175480529d829ad261') }
       m31002| 2014-10-22T00:49:27.423+0000 I REPLSETS replSetElect voting yea for ip-10-123-133-157:31000 (0)
       m31000| 2014-10-22T00:49:27.429+0000 F REPLSETS [conn1] replSet error : logOp() but can't accept write to collection fgIndexSec.jstests_fgsec
       m31000| 2014-10-22T00:49:27.431+0000 I -        [conn1] Fatal Assertion 17405
       m31000| 2014-10-22T00:49:27.447+0000 I -        [conn1] 
       m31000|  0x8e6ea23 0x8e06258 0x8de6d76 0x8a1b8c2 0x8a1ac05 0x8644238 0x86448db 0x8645e5c 0x86466ad 0x864a15c 0x866edba 0x8670f77 0x86724ea 0x89494ad 0x87c219d 0x87c4114 0x83ac3f3 0x8e202e7 0xe34939 0x5837ae
       m31000| ----- BEGIN BACKTRACE -----
       m31000| {"backtrace":[{"b":"8048000","o":"E26A23"},{"b":"8048000","o":"DBE258"},{"b":"8048000","o":"D9ED76"},{"b":"8048000","o":"9D38C2"},{"b":"8048000","o":"9D2C05"},{"b":"8048000","o":"5FC238"},{"b":"8048000","o":"5FC8DB"},{"b":"8048000","o":"5FDE5C"},{"b":"8048000","o":"5FE6AD"},{"b":"8048000","o":"60215C"},{"b":"8048000","o":"626DBA"},{"b":"8048000","o":"628F77"},{"b":"8048000","o":"62A4EA"},{"b":"8048000","o":"9014AD"},{"b":"8048000","o":"77A19D"},{"b":"8048000","o":"77C114"},{"b":"8048000","o":"3643F3"},{"b":"8048000","o":"DD82E7"},{"b":"E2F000","o":"5939"},{"b":"4AC000","o":"D77AE"}],"processInfo":{ "mongodbVersion" : "2.7.8-pre-", "gitVersion" : "9fbc9684ce381f5ef49709696ad0eb69f521d93c", "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:08:06 EDT 2010", "machine" : "i686" }, "somap" : [ { "elfType" : 2, "b" : "8048000" }, { "b" : "278000", "elfType" : 3 }, { "b" : "DC0000", "path" : "/lib/i686/nosegneg/librt.so.1", "elfType" : 3 }, { "b" : "76F000", "path" : "/lib/libdl.so.2", "elfType" : 3 }, { "b" : "286000", "path" : "/usr/lib/libstdc++.so.6", "elfType" : 3 }, { "b" : "120000", "path" : "/lib/i686/nosegneg/libm.so.6", "elfType" : 3 }, { "b" : "181000", "path" : "/lib/libgcc_s.so.1", "elfType" : 3 }, { "b" : "E2F000", "path" : "/lib/i686/nosegneg/libpthread.so.0", "elfType" : 3 }, { "b" : "4AC000", "path" : "/lib/i686/nosegneg/libc.so.6", "elfType" : 3 }, { "b" : "48F000", "path" : "/lib/ld-linux.so.2", "elfType" : 3 } ] }}
       m31000|  mongod(_ZN5mongo15printStackTraceERSo+0x33) [0x8e6ea23]
       m31000|  mongod(_ZN5mongo10logContextEPKc+0xC8) [0x8e06258]
       m31000|  mongod(_ZN5mongo13fassertFailedEi+0xA6) [0x8de6d76]
       m31000|  mongod(+0x9D38C2) [0x8a1b8c2]
       m31000|  mongod(_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_Pbb+0x135) [0x8a1ac05]
       m31000|  mongod(_ZN5mongo18WriteBatchExecutor13execOneInsertEPNS0_16ExecInsertsStateEPPNS_16WriteErrorDetailE+0x2A8) [0x8644238]
       m31000|  mongod(_ZN5mongo18WriteBatchExecutor11execInsertsERKNS_21BatchedCommandRequestEPSt6vectorIPNS_16WriteErrorDetailESaIS6_EE+0x2CB) [0x86448db]
       m31000|  mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x3C) [0x8645e5c]
       m31000|  mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x4ED) [0x86466ad]
       m31000|  mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1EC) [0x864a15c]
       m31000|  mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x5A) [0x866edba]
       m31000|  mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xEE7) [0x8670f77]
       m31000|  mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x8CA) [0x86724ea]
       m31000|  mongod(_ZN5mongo11newRunQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x25AD) [0x89494ad]
       m31000|  mongod(+0x77A19D) [0x87c219d]
       m31000|  mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xB74) [0x87c4114]
       m31000|  mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xB3) [0x83ac3f3]
       m31000|  mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4B7) [0x8e202e7]
       m31000|  libpthread.so.0(+0x5939) [0xe34939]
       m31000|  libc.so.6(clone+0x5E) [0x5837ae]
       m31000| -----  END BACKTRACE  -----
      

            Assignee:
            spencer@mongodb.com Spencer Brody (Inactive)
            Reporter:
            ian@mongodb.com Ian Whalen (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: