• Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.4
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL

      We are running 7 shards, each consisting of 3 replicaset members. We do pre-splitting. One of the shards is not accepting new chunks anymore, even if the chunk is empty.
      The logs are saying that it's waiting for replication but all members are perfectly in sync. We've read that it might be a problem of local.slaves collection for version 2.2 and 2.4 but we are running v2.6.4 already. We dropped local.slaves collection neverthless but it did not help. We also stepped down the primary with no success. We also stopped one replSet member, removed its data, brought it up again, waited to be in sync, elected it as Primary but the chunkMove never succeeded.

      What can we do to get this shard accepting new chunks again?
      Here are the logs of the Primary of the destination shard grepped by "migrateThread":

      2014-12-09T16:53:02.345+0100 [migrateThread] warning: migrate commit waiting for 2 slaves for 'offerStore.offer' { _id: 3739440290 } -> { _id: 3739940290 } waiting for: 54870f52:a9
      2014-12-09T16:53:03.345+0100 [migrateThread] Waiting for replication to catch up before entering critical section
      2014-12-09T16:53:04.345+0100 [migrateThread] Waiting for replication to catch up before entering critical section
      2014-12-09T16:53:05.345+0100 [migrateThread] Waiting for replication to catch up before entering critical section
      2014-12-09T16:53:06.345+0100 [migrateThread] Waiting for replication to catch up before entering critical section
      

      This is the replication status of the replSet:

      offerStoreDE2:SECONDARY> rs.status()
      {
      	"set" : "offerStoreDE2",
      	"date" : ISODate("2014-12-09T15:58:26Z"),
      	"myState" : 2,
      	"syncingTo" : "s131:27017",
      	"members" : [
      		{
      			"_id" : 3,
      			"name" : "s136:27017",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 6458100,
      			"optime" : Timestamp(1418140706, 503),
      			"optimeDate" : ISODate("2014-12-09T15:58:26Z"),
      			"self" : true
      		},
      		{
      			"_id" : 4,
      			"name" : "s131:27017",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 1919333,
      			"optime" : Timestamp(1418140706, 437),
      			"optimeDate" : ISODate("2014-12-09T15:58:26Z"),
      			"lastHeartbeat" : ISODate("2014-12-09T15:58:26Z"),
      			"lastHeartbeatRecv" : ISODate("2014-12-09T15:58:25Z"),
      			"pingMs" : 0,
      			"syncingTo" : "s568:27017"
      		},
      		{
      			"_id" : 6,
      			"name" : "s568:27017",
      			"health" : 1,
      			"state" : 1,
      			"stateStr" : "PRIMARY",
      			"uptime" : 8893,
      			"optime" : Timestamp(1418140706, 51),
      			"optimeDate" : ISODate("2014-12-09T15:58:26Z"),
      			"lastHeartbeat" : ISODate("2014-12-09T15:58:26Z"),
      			"lastHeartbeatRecv" : ISODate("2014-12-09T15:58:26Z"),
      			"pingMs" : 0,
      			"electionTime" : Timestamp(1418137258, 1),
      			"electionDate" : ISODate("2014-12-09T15:00:58Z")
      		}
      	],
      	"ok" : 1
      }
      

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            kay.agahd@idealo.de Kay Agahd
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: