Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21181

metadata event for moveChunk - not logging and seems forgotten

    XMLWordPrintableJSON

Details

    • Icon: Question Question
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 2.6.1
    • Sharding
    • None

    Description

      We are running MongoDB 2.6.1. A moveChunk operation seemed to have failed because the secondary had a Fatal Assertion error (our secondary files may be corrupted, so we are looking at a re-synch). The balancer was stuck as a result. We restarted the process, and we see that the primary logged the messages below, yet when I look now at the changelog, there is only one record for this chunk which moveChunk.start.

      After restarting the primary, it seems like it did forget about logging the metadata event, is that possible? How do we recover from this?

      Logs from the primary:

      2015-10-28T00:03:26.616-0400 [migrateThread] about to log metadata event: { _id: "shard10-2015-10-28T04:03:26-5630490e8d918836ed653d66", server: "shard10", clientAddr: ":27017", time: new Date(1446005006616), what: "moveChunk.to", ns: "prodAB.Instr_2015_10_26_IntervalRecord", details: { min: { appName: "AlertsAccumulator", ts: new Date(1445817600002) }, max: { appName: "CES_GI", ts: new Date(1445830500005) }, step 1 of 5: 1, step 2 of 5: 0, note: "aborted" } }
      ...
      2015-10-28T00:03:26.616-0400 [migrateThread] SyncClusterConnection connecting to [spider:43045]
      2015-10-28T00:03:26.617-0400 [migrateThread] warning: Failed to connect to 138.12.88.115:43045, reason: errno:111 Connection refused
      2015-10-28T00:03:26.617-0400 [migrateThread] SyncClusterConnection connect fail to: spider:43045 errmsg: couldn't connect to server spider:43045 (xxx), connection attempt failed
      ...
      2015-10-28T00:03:26.635-0400 [migrateThread] not logging config change: shard10-2015-10-28T04:03:26-5630490e8d918836ed653d66 can't authenticate to server spider:43045,spider2:43045,spider3:43045
      2015-10-28T00:03:26.635-0400 [migrateThread] ERROR: migrate failed: waitForReplication called but not master anymore
      2015-10-28T00:03:26.635-0400 [migrateThread] warning: no need to forget pending chunk [{ appName: "AlertsAccumulator", ts: new Date(1445817600002) },{ appName: "CES_GI", ts: new Date(1445830500005) }) because the local metadata for prodAB.Instr_2015_10_26_IntervalRecord has changed
      
      

      In the changelog collection:

      mongos> db.changelog.find({ns: "prodAB.Instr_2015_10_26_IntervalRecord", what: /^moveChunk./, "details.min.appName": "AlertsAccumulator"})
       
      { "_id" : "shard09-2015-10-28T02:14:41-56302f917c97e85c20d69b55", "server" : "shard09", "clientAddr" : "xxx", "time" : ISODate("2015-10-28T02:14:41.265Z"), "what" : "moveChunk.start", "ns" : "prodAB.Instr_2015_10_26_IntervalRecord", "details" : { "min" : { "appName" : "AlertsAccumulator", "ts" : ISODate("2015-10-26T00:00:00.002Z") }, "max" : { "appName" : "CES_GI", "ts" : ISODate("2015-10-26T03:35:00.005Z") }, "from" : "rs10", "to" : "rs1" } }
      

      How do we recover from this? Do we manually insert the missing metadata? Will the balancer pickup on the metadata insert and release the lock?

      Attachments

        Activity

          People

            Unassigned Unassigned
            cgspohn Carmen Spohn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: