|
We are running MongoDB 2.6.1. A moveChunk operation seemed to have failed because the secondary had a Fatal Assertion error (our secondary files may be corrupted, so we are looking at a re-synch). The balancer was stuck as a result. We restarted the process, and we see that the primary logged the messages below, yet when I look now at the changelog, there is only one record for this chunk which moveChunk.start.
After restarting the primary, it seems like it did forget about logging the metadata event, is that possible? How do we recover from this?
Logs from the primary:
2015-10-28T00:03:26.616-0400 [migrateThread] about to log metadata event: { _id: "shard10-2015-10-28T04:03:26-5630490e8d918836ed653d66", server: "shard10", clientAddr: ":27017", time: new Date(1446005006616), what: "moveChunk.to", ns: "prodAB.Instr_2015_10_26_IntervalRecord", details: { min: { appName: "AlertsAccumulator", ts: new Date(1445817600002) }, max: { appName: "CES_GI", ts: new Date(1445830500005) }, step 1 of 5: 1, step 2 of 5: 0, note: "aborted" } }
|
...
|
2015-10-28T00:03:26.616-0400 [migrateThread] SyncClusterConnection connecting to [spider:43045]
|
2015-10-28T00:03:26.617-0400 [migrateThread] warning: Failed to connect to 138.12.88.115:43045, reason: errno:111 Connection refused
|
2015-10-28T00:03:26.617-0400 [migrateThread] SyncClusterConnection connect fail to: spider:43045 errmsg: couldn't connect to server spider:43045 (xxx), connection attempt failed
|
...
|
2015-10-28T00:03:26.635-0400 [migrateThread] not logging config change: shard10-2015-10-28T04:03:26-5630490e8d918836ed653d66 can't authenticate to server spider:43045,spider2:43045,spider3:43045
|
2015-10-28T00:03:26.635-0400 [migrateThread] ERROR: migrate failed: waitForReplication called but not master anymore
|
2015-10-28T00:03:26.635-0400 [migrateThread] warning: no need to forget pending chunk [{ appName: "AlertsAccumulator", ts: new Date(1445817600002) },{ appName: "CES_GI", ts: new Date(1445830500005) }) because the local metadata for prodAB.Instr_2015_10_26_IntervalRecord has changed
|
|
In the changelog collection:
mongos> db.changelog.find({ns: "prodAB.Instr_2015_10_26_IntervalRecord", what: /^moveChunk./, "details.min.appName": "AlertsAccumulator"})
|
|
{ "_id" : "shard09-2015-10-28T02:14:41-56302f917c97e85c20d69b55", "server" : "shard09", "clientAddr" : "xxx", "time" : ISODate("2015-10-28T02:14:41.265Z"), "what" : "moveChunk.start", "ns" : "prodAB.Instr_2015_10_26_IntervalRecord", "details" : { "min" : { "appName" : "AlertsAccumulator", "ts" : ISODate("2015-10-26T00:00:00.002Z") }, "max" : { "appName" : "CES_GI", "ts" : ISODate("2015-10-26T03:35:00.005Z") }, "from" : "rs10", "to" : "rs1" } }
|
How do we recover from this? Do we manually insert the missing metadata? Will the balancer pickup on the metadata insert and release the lock?
|