Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.4
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We are running 7 shards, each consisting of 3 replicaset members. We do pre-splitting. One of the shards is not accepting new chunks anymore, even if the chunk is empty.
The logs are saying that it's waiting for replication but all members are perfectly in sync. We've read that it might be a problem of local.slaves collection for version 2.2 and 2.4 but we are running v2.6.4 already. We dropped local.slaves collection neverthless but it did not help. We also stepped down the primary with no success. We also stopped one replSet member, removed its data, brought it up again, waited to be in sync, elected it as Primary but the chunkMove never succeeded.

What can we do to get this shard accepting new chunks again?
Here are the logs of the Primary of the destination shard grepped by "migrateThread":

2014-12-09T16:53:02.345+0100 [migrateThread] warning: migrate commit waiting for 2 slaves for 'offerStore.offer' { _id: 3739440290 } -> { _id: 3739940290 } waiting for: 54870f52:a9
2014-12-09T16:53:03.345+0100 [migrateThread] Waiting for replication to catch up before entering critical section
2014-12-09T16:53:04.345+0100 [migrateThread] Waiting for replication to catch up before entering critical section
2014-12-09T16:53:05.345+0100 [migrateThread] Waiting for replication to catch up before entering critical section
2014-12-09T16:53:06.345+0100 [migrateThread] Waiting for replication to catch up before entering critical section

This is the replication status of the replSet:

offerStoreDE2:SECONDARY> rs.status()
{
	"set" : "offerStoreDE2",
	"date" : ISODate("2014-12-09T15:58:26Z"),
	"myState" : 2,
	"syncingTo" : "s131:27017",
	"members" : [
		{
			"_id" : 3,
			"name" : "s136:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 6458100,
			"optime" : Timestamp(1418140706, 503),
			"optimeDate" : ISODate("2014-12-09T15:58:26Z"),
			"self" : true
		},
		{
			"_id" : 4,
			"name" : "s131:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 1919333,
			"optime" : Timestamp(1418140706, 437),
			"optimeDate" : ISODate("2014-12-09T15:58:26Z"),
			"lastHeartbeat" : ISODate("2014-12-09T15:58:26Z"),
			"lastHeartbeatRecv" : ISODate("2014-12-09T15:58:25Z"),
			"pingMs" : 0,
			"syncingTo" : "s568:27017"
		},
		{
			"_id" : 6,
			"name" : "s568:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 8893,
			"optime" : Timestamp(1418140706, 51),
			"optimeDate" : ISODate("2014-12-09T15:58:26Z"),
			"lastHeartbeat" : ISODate("2014-12-09T15:58:26Z"),
			"lastHeartbeatRecv" : ISODate("2014-12-09T15:58:26Z"),
			"pingMs" : 0,
			"electionTime" : Timestamp(1418137258, 1),
			"electionDate" : ISODate("2014-12-09T15:00:58Z")
		}
	],
	"ok" : 1
}

duplicates

SERVER-15849 Secondaries should not forward replication information for removed chained nodes

Closed

Assignee:: Randolph Tan
Reporter:: Kay Agahd
Participants:: Kay Agahd, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Dec 09 2014 04:05:57 PM UTC
Updated:: Jan 24 2015 05:18:33 PM UTC
Resolved:: Dec 17 2014 10:31:21 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates