[SERVER-21181] metadata event for moveChunk - not logging and seems forgotten Created: 28/Oct/15  Updated: 28/Oct/15  Resolved: 28/Oct/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.1
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Carmen Spohn Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

We are running MongoDB 2.6.1. A moveChunk operation seemed to have failed because the secondary had a Fatal Assertion error (our secondary files may be corrupted, so we are looking at a re-synch). The balancer was stuck as a result. We restarted the process, and we see that the primary logged the messages below, yet when I look now at the changelog, there is only one record for this chunk which moveChunk.start.

After restarting the primary, it seems like it did forget about logging the metadata event, is that possible? How do we recover from this?

Logs from the primary:

2015-10-28T00:03:26.616-0400 [migrateThread] about to log metadata event: { _id: "shard10-2015-10-28T04:03:26-5630490e8d918836ed653d66", server: "shard10", clientAddr: ":27017", time: new Date(1446005006616), what: "moveChunk.to", ns: "prodAB.Instr_2015_10_26_IntervalRecord", details: { min: { appName: "AlertsAccumulator", ts: new Date(1445817600002) }, max: { appName: "CES_GI", ts: new Date(1445830500005) }, step 1 of 5: 1, step 2 of 5: 0, note: "aborted" } }
...
2015-10-28T00:03:26.616-0400 [migrateThread] SyncClusterConnection connecting to [spider:43045]
2015-10-28T00:03:26.617-0400 [migrateThread] warning: Failed to connect to 138.12.88.115:43045, reason: errno:111 Connection refused
2015-10-28T00:03:26.617-0400 [migrateThread] SyncClusterConnection connect fail to: spider:43045 errmsg: couldn't connect to server spider:43045 (xxx), connection attempt failed
...
2015-10-28T00:03:26.635-0400 [migrateThread] not logging config change: shard10-2015-10-28T04:03:26-5630490e8d918836ed653d66 can't authenticate to server spider:43045,spider2:43045,spider3:43045
2015-10-28T00:03:26.635-0400 [migrateThread] ERROR: migrate failed: waitForReplication called but not master anymore
2015-10-28T00:03:26.635-0400 [migrateThread] warning: no need to forget pending chunk [{ appName: "AlertsAccumulator", ts: new Date(1445817600002) },{ appName: "CES_GI", ts: new Date(1445830500005) }) because the local metadata for prodAB.Instr_2015_10_26_IntervalRecord has changed

In the changelog collection:

mongos> db.changelog.find({ns: "prodAB.Instr_2015_10_26_IntervalRecord", what: /^moveChunk./, "details.min.appName": "AlertsAccumulator"})
 
{ "_id" : "shard09-2015-10-28T02:14:41-56302f917c97e85c20d69b55", "server" : "shard09", "clientAddr" : "xxx", "time" : ISODate("2015-10-28T02:14:41.265Z"), "what" : "moveChunk.start", "ns" : "prodAB.Instr_2015_10_26_IntervalRecord", "details" : { "min" : { "appName" : "AlertsAccumulator", "ts" : ISODate("2015-10-26T00:00:00.002Z") }, "max" : { "appName" : "CES_GI", "ts" : ISODate("2015-10-26T03:35:00.005Z") }, "from" : "rs10", "to" : "rs1" } }

How do we recover from this? Do we manually insert the missing metadata? Will the balancer pickup on the metadata insert and release the lock?



 Comments   
Comment by Ramon Fernandez Marina [ 28/Oct/15 ]

Thanks for your report cgspohn. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Regards,
Ramón.

Generated at Thu Feb 08 03:56:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.