Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4857

Mongos crashes with a signal 11

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major - P3 Major - P3
    • None
    • 2.0.2
    • Sharding
    • CentOS 64 bit (Linux <hostname> 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux)
    • ALL

    Description

      Mongos crashes with a signal 11 (according to the log).

      Context:
      One of the shard primaries became unstable - networking problems causes high latencies, server appears to flicker between UP and DOWN - this is shard "s6" in the log below, one primary one secondary and one arbiter each on a different server all binding port 5006. There are three config mongod instances on yet other three servers, all binding port 5000. The mongos is running on a host called "MongosHost" binding port 7000.

      Mongos log (the log indeed ends without anything after the "Backtrace" line):
      Thu Feb 2 21:13:21 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afc7114d54bcb687ad1e5
      Thu Feb 2 21:13:21 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:13:32 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afc7b14d54bcb687ad1e6
      Thu Feb 2 21:13:32 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:13:42 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afc8614d54bcb687ad1e7
      Thu Feb 2 21:13:42 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:13:48 [LockPinger] cluster x.x.x.223:5000,x.x.x.224:5000,x.x.x.225:5000 pinged successfully at Thu Feb 2 21:13:48 2012 by distributed lock pinger 'x.x.x.223:5000,x.x.x.224:5000,x.x.x.225:5000/MongosHost:7000:1328043036:1804289383', sleeping for 30000ms
      Thu Feb 2 21:13:52 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afc9014d54bcb687ad1e8
      Thu Feb 2 21:13:52 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:14:03 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afc9b14d54bcb687ad1e9
      Thu Feb 2 21:14:03 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:14:13 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afca514d54bcb687ad1ea
      Thu Feb 2 21:14:13 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:14:23 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afcaf14d54bcb687ad1eb
      Thu Feb 2 21:14:23 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:14:34 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afcb914d54bcb687ad1ec
      Thu Feb 2 21:14:34 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:14:44 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afcc414d54bcb687ad1ed
      Thu Feb 2 21:14:44 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:14:54 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afcce14d54bcb687ad1ee
      Thu Feb 2 21:14:54 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:15:05 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afcd914d54bcb687ad1ef
      Thu Feb 2 21:15:05 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:15:15 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afce314d54bcb687ad1f0
      Thu Feb 2 21:15:15 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:15:26 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' acquired, ts : 4f2afced14d54bcb687ad1f1
      Thu Feb 2 21:15:26 [Balancer] distributed lock 'balancer/MongosHost:7000:1328043036:1804289383' unlocked.
      Thu Feb 2 21:15:51 [conn128] DBException in process: could not initialize cursor across all shards because : not master and slaveok=false @ s6/x.x.x.130:5006,x.x.x.128:5006
      Thu Feb 2 21:15:52 [conn128] DBException in process: could not initialize cursor across all shards because : not master and slaveok=false @ s6/x.x.x.130:5006,x.x.x.128:5006
      Thu Feb 2 21:15:52 [conn128] DBException in process: could not initialize cursor across all shards because : not master and slaveok=false @ s6/x.x.x.130:5006,x.x.x.128:5006
      Thu Feb 2 21:15:54 [conn128] DBException in process: could not initialize cursor across all shards because : not master and slaveok=false @ s6/x.x.x.130:5006,x.x.x.128:5006
      Thu Feb 2 21:15:54 [WriteBackListener-x.x.x.128:5006] DBClientCursor::init call() failed
      Thu Feb 2 21:15:54 [WriteBackListener-x.x.x.128:5006] WriteBackListener exception : DBClientBase::findN: transport error: x.x.x.128:5006 query:

      { writebacklisten: ObjectId('4f28541c14d54bcb687a9398') }

      Thu Feb 2 21:15:55 [conn128] DBException in process: could not initialize cursor across all shards because : not master and slaveok=false @ s6/x.x.x.130:5006,x.x.x.128:5006
      Thu Feb 2 21:15:56 [conn128] created new distributed lock for database.collection on x.x.x.223:5000,x.x.x.224:5000,x.x.x.225:5000 ( lock timeout : 900000, ping interval : 30000, process : 0 )
      Thu Feb 2 21:15:56 [conn128] ChunkManager: time to load chunks for database.collection: 4ms sequenceNumber: 67 version: 109|11
      Thu Feb 2 21:15:56 [conn128] autosplitted database.collection shard: ns:database.collection at: s3:s3/x.x.x.225:5003,x.x.x.223:5003 lastmod: 84|0 min:

      { key: "xxx382" }

      max:

      { key: "xxx808" }

      on:

      { key: "xxx231" }

      (splitThreshold 67108864)
      Thu Feb 2 21:15:56 [conn128] update will be retried b/c sharding config info is stale, left:4 ns: database.collection query:

      { key: "xxx920", ts: 1324097710000 }

      Thu Feb 2 21:15:56 [ReplicaSetMonitorWatcher] Socket recv() errno:104 Connection reset by peer x.x.x.128:5006
      Thu Feb 2 21:15:56 [ReplicaSetMonitorWatcher] SocketException: remote: x.x.x.128:5006 error: 9001 socket exception [1] server [x.x.x.128:5006]
      Thu Feb 2 21:15:56 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Thu Feb 2 21:15:56 [ReplicaSetMonitorWatcher] trying reconnect to x.x.x.128:5006
      Thu Feb 2 21:15:56 [conn128] update will be retried b/c sharding config info is stale, left:4 ns: database.collection query:

      { key: "xxx051", ts: 1324078158000 }

      Thu Feb 2 21:15:56 [ReplicaSetMonitorWatcher] reconnect x.x.x.128:5006 ok
      Thu Feb 2 21:15:56 [conn128] update will be retried b/c sharding config info is stale, left:4 ns: database.collection query:

      { key: "xxx562", ts: 1324054409000 }

      Thu Feb 2 21:15:57 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: x.x.x.130:5006

      { setName: "s6", ismaster: false, secondary: true, hosts: [ "x.x.x.130:5006", "x.x.x.128:5006" ], arbiters: [ "x.x.x.129:5006" ], primary: "x.x.x.128:5006", me: "x.x.x.130:5006", maxBsonObjectSize: 16777216, ok: 1.0 }

      Thu Feb 2 21:15:57 [ReplicaSetMonitorWatcher] ReplicaSetMonitor::_checkConnection: x.x.x.128:5006

      { setName: "s6", ismaster: false, secondary: true, hosts: [ "x.x.x.128:5006", "x.x.x.130:5006" ], arbiters: [ "x.x.x.129:5006" ], me: "x.x.x.128:5006", maxBsonObjectSize: 16777216, ok: 1.0 }

      Thu Feb 2 21:15:57 [conn128] ReplicaSetMonitor::_checkConnection: x.x.x.130:5006

      { setName: "s6", ismaster: false, secondary: true, hosts: [ "x.x.x.130:5006", "x.x.x.128:5006" ], arbiters: [ "x.x.x.129:5006" ], primary: "x.x.x.128:5006", me: "x.x.x.130:5006", maxBsonObjectSize: 16777216, ok: 1.0 }

      Thu Feb 2 21:15:57 [conn128] ReplicaSetMonitor::_checkConnection: x.x.x.128:5006

      { setName: "s6", ismaster: false, secondary: true, hosts: [ "x.x.x.128:5006", "x.x.x.130:5006" ], arbiters: [ "x.x.x.129:5006" ], me: "x.x.x.128:5006", maxBsonObjectSize: 16777216, ok: 1.0 }

      Thu Feb 2 21:15:59 [conn128] got not master for: x.x.x.130:5006
      Received signal 11
      Backtrace:

      Attachments

        Activity

          People

            Unassigned Unassigned
            tal.kedar@searshc.com Tal Kedar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: