Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3087

"Assertion: 13312:replSet error : logOp() but not primary?" and "com.mongodb.MongoException: not talking to master and retries used up"

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.8.2
    • Component/s: None
    • Labels:
      None
    • Environment:
      Ubuntu on EC2
    • Linux

      Something (I don't know what yet) caused one of my shards to fail over. When I look at the logs on the old master, I see this:

      Thu May 12 15:47:22 [conn21060] remove example.userEventsJournal query:

      { _id: ObjectId('4d56ea01fa00a03b38c17fde') }

      232ms
      Thu May 12 15:47:23 [conn21060] remove example.userEventsJournal query:

      { _id: ObjectId('4d56ea0a04e235872bd9e29e') }

      291ms
      Thu May 12 15:47:25 [conn20802] query example.$cmd ntoreturn:1 command: { distinct: "cookbook", key: "cookbookId", query: {} } reslen:1445 1401ms
      Thu May 12 15:47:25 [conn20841] query example.$cmd ntoreturn:1 command: { distinct: "cookbook", key: "cookbookId", query: {} } reslen:1445 664ms
      Thu May 12 15:47:28 [dur] lsn set 120976684
      Thu May 12 15:47:41 [conn21062] getmore local.oplog.rs cid:3926093491549750074 getMore: { ts:

      { $gte: new Date(5605849438531616885) }

      } bytes:789 nreturned:3 12119ms
      Thu May 12 15:47:41 [ReplSetHealthPollTask] replSet member mongodb05.example.com:27018 PRIMARY
      Thu May 12 15:47:41 [conn21035] Assertion: 13312:replSet error : logOp() but not primary?
      0x55f1b9 0x712a31 0x7108f2 0x668fa1 0x66ba60 0x75814f 0x75aae1 0x8a75ee 0x8ba510 0x7fb642df69ca 0x7fb6423a570d
      /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x129) [0x55f1b9]
      /usr/bin/mongod() [0x712a31]
      /usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x7108f2]
      /usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE+0x1131) [0x668fa1]
      /usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE+0x130) [0x66ba60]
      /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x47f) [0x75814f]
      /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x1941) [0x75aae1]
      /usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a75ee]
      /usr/bin/mongod(thread_proxy+0x80) [0x8ba510]
      /lib/libpthread.so.0(+0x69ca) [0x7fb642df69ca]
      /lib/libc.so.6(clone+0x6d) [0x7fb6423a570d]
      Thu May 12 15:47:41 [conn21035] update example.stats query:

      { entityId: "hal_livingSocial_DFA03", bucketId: "groups", variationId: "_default" }

      exception 13312 replSet error : logOp() but not primary? 1ms
      Thu May 12 15:47:41 [conn20999] Assertion: 13312:replSet error : logOp() but not primary?
      0x55f1b9 0x712a31 0x7108f2 0x668fa1 0x66ba60 0x75814f 0x75aae1 0x8a75ee 0x8ba510 0x7fb642df69ca 0x7fb6423a570d
      /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x129) [0x55f1b9]
      /usr/bin/mongod() [0x712a31]
      /usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x7108f2]
      /usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE+0x1131) [0x668fa1]
      /usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE+0x130) [0x66ba60]
      /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x47f) [0x75814f]
      /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x1941) [0x75aae1]
      /usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a75ee]
      /usr/bin/mongod(thread_proxy+0x80) [0x8ba510]
      /lib/libpthread.so.0(+0x69ca) [0x7fb642df69ca]
      /lib/libc.so.6(clone+0x6d) [0x7fb6423a570d]

      There are many more of those "logOp() but not primary?" messages after this. Followed by several messages like this:

      Thu May 12 15:47:41 [conn21003] update example.stats query:

      { entityId: "hal_livingSocial_DFA03", bucketId: "groups", variationId: "_default" }

      exception 13312 replSet error : logOp() but not primary? 1ms
      Thu May 12 15:47:41 [conn21033] assertion 13435 not master and slaveok=false ns:example.ads query:

      { adId: "hal_livingSocial_cookbook_dfa_winning02" }

      Thu May 12 15:47:41 [conn21033] ntoskip:0 ntoreturn:-1
      Thu May 12 15:47:41 [conn21024] assertion 13435 not master and slaveok=false ns:example.cookbook query:

      { cookbookId: "livingSocialBaselineVariation07" }

      Thu May 12 15:47:41 [conn21024] ntoskip:0 ntoreturn:-1
      Thu May 12 15:47:41 [conn21012] assertion 13435 not master and slaveok=false ns:example.cookbook query:

      { cookbookId: "livingSocialBaselineVariation08" }

      Thu May 12 15:47:41 [conn21012] ntoskip:0 ntoreturn:-1
      Thu May 12 15:47:41 [conn21018] assertion 13435 not master and slaveok=false ns:example.cookbook query:

      { cookbookId: "livingSocialBaselineVariation07" }

      Thu May 12 15:47:41 [conn21018] ntoskip:0 ntoreturn:-1
      Thu May 12 15:47:41 [conn20992] assertion 13435 not master and slaveok=false ns:example.cookbook query:

      { cookbookId: "livingSocialBaselineVariation08" }

      Thu May 12 15:47:41 [conn20992] ntoskip:0 ntoreturn:-1Thu May 12 15:47:41 [conn20994] assertion 13435 not master and slaveok=false ns:example.cookbook query:

      { cookbookId: "livingSocialBaselineVariation08" }

      Thu May 12 15:47:41 [conn20994] ntoskip:0 ntoreturn:-1

      Then this:

      Thu May 12 15:47:43 [replica set sync] replSet our last op time written: May 12 15:47:29:10
      Thu May 12 15:47:43 [replica set sync] replset source's GTE: May 12 15:47:41:1
      Thu May 12 15:47:43 [replica set sync] replSet rollback 0
      Thu May 12 15:47:43 [replica set sync] replSet rollback 1
      Thu May 12 15:47:43 [replica set sync] replSet rollback 2 FindCommonPoint
      Thu May 12 15:47:43 [replica set sync] replSet info rollback our last optime: May 12 15:47:29:10
      Thu May 12 15:47:43 [replica set sync] replSet info rollback their last optime: May 12 15:47:42:a
      Thu May 12 15:47:43 [replica set sync] replSet info rollback diff in end of log times: -13 secondsThu May 12 15:47:43 [replica set sync] replSet rollback found matching events at May 12 15:47:29:d
      Thu May 12 15:47:43 [replica set sync] replSet rollback findcommonpoint scanned : 38
      Thu May 12 15:47:43 [replica set sync] replSet replSet rollback 3 fixup
      Thu May 12 15:47:43 [replica set sync] replSet rollback 3.5
      Thu May 12 15:47:43 [replica set sync] replSet rollback 4 n:3
      Thu May 12 15:47:43 [replica set sync] replSet minvalid=May 12 15:47:42 4dcc011e:a
      Thu May 12 15:47:43 [replica set sync] replSet rollback 4.6
      Thu May 12 15:47:43 [replica set sync] replSet rollback 4.7
      Thu May 12 15:47:43 [replica set sync] replSet rollback 5 d:0 u:3
      Thu May 12 15:47:43 [replica set sync] replSet rollback 6
      Thu May 12 15:47:43 [replica set sync] replSet rollback 7
      Thu May 12 15:47:43 [replica set sync] replSet rollback done
      Thu May 12 15:47:43 [dur] lsn set 120982480

      And finally:

      Thu May 12 15:48:24 [dur] lsn set 121000882
      Thu May 12 15:48:25 [ReplSetHealthPollTask] DBClientCursor::init call() failed
      Thu May 12 15:48:25 [ReplSetHealthPollTask] replSet info mongodb05.example.com:27018 is down (or slow to respond): DBClientBase::findOne: transport error: mongodb05.example.com:27018 query:

      { replSetHeartbeat: "2", v: 7, pv: 1, checkEmpty: false, from: "mongodb04.example.com:27018" }

      Thu May 12 15:48:25 [rs Manager] replSet info electSelf 0
      Thu May 12 15:48:25 [rs Manager] replSet couldn't elect self, only received -9999 votes
      Thu May 12 15:48:29 [ReplSetHealthPollTask] replSet info mongodb05.example.com:27018 is up
      Thu May 12 15:49:10 [conn21155] query local.$cmd ntoreturn:1 command:

      { collstats: "oplog.rs" }

      reslen:284 3638ms
      Thu May 12 15:49:33 [dur] lsn set 121030916
      Thu May 12 15:50:20 [dur] lsn set 121060941

      On the new master, I see this:

      Thu May 12 15:47:40 [ReplSetHealthPollTask] DBClientCursor::init call() failed
      Thu May 12 15:47:40 [ReplSetHealthPollTask] replSet info mongodb04.socialmedia.com:27018 is down (or slow to respond): DBClientBase::findOne: transport error: mongodb04.socialmedia.com:27018 query:

      { replSetHeartbeat: "2", v: 7, pv: 1, checkEmpty: false, from: "mongodb05.socialmedia.com:27018" }

      Thu May 12 15:47:40 [rs Manager] replSet info electSelf 1
      Thu May 12 15:47:40 [rs Manager] replSet PRIMARY
      Thu May 12 15:47:41 [replica set sync] replSet stopping syncTail we are now primary
      Thu May 12 15:47:42 [ReplSetHealthPollTask] replSet info mongodb04.socialmedia.com:27018 is up
      Thu May 12 15:47:42 [ReplSetHealthPollTask] replSet member mongodb04.socialmedia.com:27018 RECOVERING
      Thu May 12 15:47:46 [conn52890] remove socialmedia.userEventsJournal query:

      { _id: ObjectId('4d56eb456655aa982210b2b9') }

      115ms
      Thu May 12 15:47:46 [ReplSetHealthPollTask] replSet member mongodb04.socialmedia.com:27018 SECONDARY

      So failover appears to have completed successfully. Though now I'm seeing exceptions like this in my app:

      ERR [20110512-15:47:43.401] blueeyes: com.mongodb.MongoException: not talking to master and retries used up
      ERR [20110512-15:47:43.401] blueeyes: at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:227)
      ERR [20110512-15:47:43.401] blueeyes: at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:229)
      ERR [20110512-15:47:43.401] blueeyes: at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:229)
      ERR [20110512-15:47:43.401] blueeyes: at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:295)
      ERR [20110512-15:47:43.401] blueeyes: at com.mongodb.DBCursor._check(DBCursor.java:354)
      ERR [20110512-15:47:43.401] blueeyes: at com.mongodb.DBCursor._hasNext(DBCursor.java:484)
      ERR [20110512-15:47:43.401] blueeyes: at com.mongodb.DBCursor.hasNext(DBCursor.java:509)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.RealDatabaseCollection$$anon$1.hasNext(RealMongoImplementation.scala:82)
      ERR [20110512-15:47:43.401] blueeyes: at scala.collection.IterableLike$class.isEmpty(IterableLike.scala:92)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.IterableViewImpl.isEmpty(RealMongoImplementation.scala:122)
      ERR [20110512-15:47:43.401] blueeyes: at scala.collection.TraversableLike$class.headOption(TraversableLike.scala:483)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.IterableViewImpl.headOption(RealMongoImplementation.scala:122)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.QueryBehaviours$SelectOneQueryBehaviour$class.query(QueryBehaviours.scala:128)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.MongoSelectOneQuery.query(MongoQuery.scala:71)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.MongoSelectOneQuery.query(MongoQuery.scala:71)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.QueryBehaviours$MongoQueryBehaviour$class.apply(QueryBehaviours.scala:16)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.MongoSelectOneQuery.apply(MongoQuery.scala:71)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.MongoSelectOneQuery.apply(MongoQuery.scala:71)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.MongoDatabase$$anonfun$blueeyes$persistence$mongo$MongoDatabase$$mongoActor$1.apply(Mongo.scala:46)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.persistence.mongo.MongoDatabase$$anonfun$blueeyes$persistence$mongo$MongoDatabase$$mongoActor$1.apply(Mongo.scala:43)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.ActorExecutionStrategySequential$$anon$14$$anonfun$submit$1.apply(Actor.scala:17)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.Future$$anonfun$deliver$1.apply(Future.scala:50)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.Future$$anonfun$deliver$1.apply(Future.scala:46)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.ReadWriteLock$class.writeLock(ReadWriteLock.scala:10)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.Future$$anon$1.writeLock(Future.scala:22)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.Future.deliver(Future.scala:45)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.ActorExecutionStrategySequential$$anon$14.submit(Actor.scala:17)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.ActorImplementationMultiThreaded$StrategyWorker$$anonfun$run$1.apply$mcV$sp(Actor.scala:195)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.ActorImplementationMultiThreaded$ActorContext$.withActorFn(Actor.scala:230)
      ERR [20110512-15:47:43.401] blueeyes: at blueeyes.concurrent.ActorImplementationMultiThreaded$StrategyWorker.run(Actor.scala:189)
      ERR [20110512-15:47:43.401] blueeyes: (...more...)

      I do not see any errors on the mongos.log around this time:

      Thu May 12 15:47:16 [Balancer] skipping balancing round because balancing is disabled
      Thu May 12 15:47:26 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb10.example.com:27018

      { setName: "4", ismaster: true, secondary: false, hosts: [ "mongodb10.example.com:27018", "mongodb11.example.com:27018" ], arbiters: [ "mongodb12.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 1
      Thu May 12 15:47:26 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb01.example.com:27018

      { setName: "1", ismaster: true, secondary: false, hosts: [ "mongodb01.example.com:27018", "mongodb02.example.com:27018" ], arbiters: [ "mongodb03.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 2
      Thu May 12 15:47:26 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb04.example.com:27018

      { setName: "2", ismaster: true, secondary: false, hosts: [ "mongodb04.example.com:27018", "mongodb05.example.com:27018" ], arbiters: [ "mongodb06.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 3
      Thu May 12 15:47:26 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb07.example.com:27018

      { setName: "3", ismaster: true, secondary: false, hosts: [ "mongodb07.example.com:27018", "mongodb08.example.com:27018" ], arbiters: [ "mongodb09.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 4
      Thu May 12 15:47:33 [LockPinger] dist_lock pinged successfully for: appserver01:1305178845:1804289383
      Thu May 12 15:47:36 [WriteBackListener] writebacklisten result:

      { noop: true, ok: 1.0 }

      Thu May 12 15:47:36 [WriteBackListener] writebacklisten result:

      { noop: true, ok: 1.0 }

      Thu May 12 15:47:36 [WriteBackListener] writebacklisten result:

      { noop: true, ok: 1.0 }

      Thu May 12 15:47:36 [WriteBackListener] writebacklisten result:

      { noop: true, ok: 1.0 }

      Thu May 12 15:47:36 [WriteBackListener] writebacklisten result:

      { noop: true, ok: 1.0 }

      Thu May 12 15:47:36 [WriteBackListener] writebacklisten result:

      { noop: true, ok: 1.0 }

      Thu May 12 15:47:36 [WriteBackListener] writebacklisten result:

      { noop: true, ok: 1.0 }

      Thu May 12 15:47:46 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb10.example.com:27018

      { setName: "4", ismaster: true, secondary: false, hosts: [ "mongodb10.example.com:27018", "mongodb11.example.com:27018" ], arbiters: [ "mongodb12.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 1
      Thu May 12 15:47:46 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb01.example.com:27018

      { setName: "1", ismaster: true, secondary: false, hosts: [ "mongodb01.example.com:27018", "mongodb02.example.com:27018" ], arbiters: [ "mongodb03.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 2
      Thu May 12 15:47:46 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb04.example.com:27018

      { setName: "2", ismaster: false, secondary: true, hosts: [ "mongodb04.example.com:27018", "mongodb05.example.com:27018" ], arbiters: [ "mongodb06.example.com:27018" ], primary: "mongodb05.example.com:27018", maxBsonObjectSize: 16777216, ok: 1.0 }

      _check : 2/mongodb05.example.com:27018,mongodb04.example.com:27018,mongodb04.example.com:27018
      Thu May 12 15:47:46 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb05.example.com:27018

      { setName: "2", ismaster: true, secondary: false, hosts: [ "mongodb05.example.com:27018", "mongodb04.example.com:27018" ], arbiters: [ "mongodb06.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 3
      Thu May 12 15:47:46 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb07.example.com:27018

      { setName: "3", ismaster: true, secondary: false, hosts: [ "mongodb07.example.com:27018", "mongodb08.example.com:27018" ], arbiters: [ "mongodb09.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 4
      Thu May 12 15:47:46 BackgroundJob starting: ConnectBG
      Thu May 12 15:47:46 [Balancer] skipping balancing round because balancing is disabled
      Thu May 12 15:47:47 BackgroundJob starting: ConnectBG
      Thu May 12 15:47:50 [WriteBackListener] writebacklisten result:

      { noop: true, ok: 1.0 }

      Thu May 12 15:48:03 [LockPinger] dist_lock pinged successfully for: appserver01:1305178845:1804289383
      Thu May 12 15:48:06 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb10.example.com:27018

      { setName: "4", ismaster: true, secondary: false, hosts: [ "mongodb10.example.com:27018", "mongodb11.example.com:27018" ], arbiters: [ "mongodb12.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 1
      Thu May 12 15:48:06 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb01.example.com:27018

      { setName: "1", ismaster: true, secondary: false, hosts: [ "mongodb01.example.com:27018", "mongodb02.example.com:27018" ], arbiters: [ "mongodb03.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 2
      Thu May 12 15:48:06 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb05.example.com:27018

      { setName: "2", ismaster: true, secondary: false, hosts: [ "mongodb05.example.com:27018", "mongodb04.example.com:27018" ], arbiters: [ "mongodb06.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 3
      Thu May 12 15:48:06 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: mongodb07.example.com:27018

      { setName: "3", ismaster: true, secondary: false, hosts: [ "mongodb07.example.com:27018", "mongodb08.example.com:27018" ], arbiters: [ "mongodb09.example.com:27018" ], maxBsonObjectSize: 16777216, ok: 1.0 }

      checking replica set: 4

            Assignee:
            greg_10gen Greg Studer
            Reporter:
            mconigliaro Michael Conigliaro
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: