Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.1
Component/s: Sharding
Labels:
None

CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We are running MongoDB 2.6.1. A moveChunk operation seemed to have failed because the secondary had a Fatal Assertion error (our secondary files may be corrupted, so we are looking at a re-synch). The balancer was stuck as a result. We restarted the process, and we see that the primary logged the messages below, yet when I look now at the changelog, there is only one record for this chunk which moveChunk.start.

After restarting the primary, it seems like it did forget about logging the metadata event, is that possible? How do we recover from this?

Logs from the primary:

2015-10-28T00:03:26.616-0400 [migrateThread] about to log metadata event: { _id: "shard10-2015-10-28T04:03:26-5630490e8d918836ed653d66", server: "shard10", clientAddr: ":27017", time: new Date(1446005006616), what: "moveChunk.to", ns: "prodAB.Instr_2015_10_26_IntervalRecord", details: { min: { appName: "AlertsAccumulator", ts: new Date(1445817600002) }, max: { appName: "CES_GI", ts: new Date(1445830500005) }, step 1 of 5: 1, step 2 of 5: 0, note: "aborted" } }
...
2015-10-28T00:03:26.616-0400 [migrateThread] SyncClusterConnection connecting to [spider:43045]
2015-10-28T00:03:26.617-0400 [migrateThread] warning: Failed to connect to 138.12.88.115:43045, reason: errno:111 Connection refused
2015-10-28T00:03:26.617-0400 [migrateThread] SyncClusterConnection connect fail to: spider:43045 errmsg: couldn't connect to server spider:43045 (xxx), connection attempt failed
...
2015-10-28T00:03:26.635-0400 [migrateThread] not logging config change: shard10-2015-10-28T04:03:26-5630490e8d918836ed653d66 can't authenticate to server spider:43045,spider2:43045,spider3:43045
2015-10-28T00:03:26.635-0400 [migrateThread] ERROR: migrate failed: waitForReplication called but not master anymore
2015-10-28T00:03:26.635-0400 [migrateThread] warning: no need to forget pending chunk [{ appName: "AlertsAccumulator", ts: new Date(1445817600002) },{ appName: "CES_GI", ts: new Date(1445830500005) }) because the local metadata for prodAB.Instr_2015_10_26_IntervalRecord has changed

In the changelog collection:

mongos> db.changelog.find({ns: "prodAB.Instr_2015_10_26_IntervalRecord", what: /^moveChunk./, "details.min.appName": "AlertsAccumulator"})

{ "_id" : "shard09-2015-10-28T02:14:41-56302f917c97e85c20d69b55", "server" : "shard09", "clientAddr" : "xxx", "time" : ISODate("2015-10-28T02:14:41.265Z"), "what" : "moveChunk.start", "ns" : "prodAB.Instr_2015_10_26_IntervalRecord", "details" : { "min" : { "appName" : "AlertsAccumulator", "ts" : ISODate("2015-10-26T00:00:00.002Z") }, "max" : { "appName" : "CES_GI", "ts" : ISODate("2015-10-26T03:35:00.005Z") }, "from" : "rs10", "to" : "rs1" } }

How do we recover from this? Do we manually insert the missing metadata? Will the balancer pickup on the metadata insert and release the lock?

Assignee:: Unassigned
Reporter:: Carmen Spohn
Participants:: Carmen Spohn, Ramon Fernandez
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 28 2015 04:18:10 PM UTC
Updated:: Oct 28 2015 04:57:19 PM UTC
Resolved:: Oct 28 2015 04:57:19 PM UTC

Details

Description

Attachments

Activity

People

Dates