Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19101

ERROR: writer worker caught exception: :: caused by :: 10287 insertDocument :: caused by :: 10287 btree: key+recloc already in index on

    • Type: Icon: Question Question
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.1
    • Component/s: Admin
    • Labels:
      None

      two members of a replica set can't restart after unclean shut down(power loss), we use version 2.6.1

      at first, the primary power loss, then restart, we find logs :

      2015-05-30T14:31:54.231+0800 [repl writer worker 1] omm_spc.vcm_lpr_info  caught assertion addKeysToIndex omm_spc.vcm_lpr_info.$src_1_id: ObjectId('556955b9498ebd989bba8da3')
      2015-05-30T14:31:54.232+0800 [repl writer worker 1] ERROR: writer worker caught exception:  :: caused by :: 10287 insertDocument :: caused by :: 10287 btree: key+recloc already in index on: { ts: Timestamp 1432966585000|10, h: -2776666315254503545, v: 2, op: "i", ns: "omm_spc.vcm_lpr_info", o: { _id: ObjectId('556955b9498ebd989bba8da3'), ctime: 20150530141625, alarm: "637503ee069311e582bad8490bc947d4004588", cmrid: 14291, dflag: false, lpr: { bright: 1, color: 0, coords: { bottom: 0.983, left: 0.231, right: 0.282, top: 0.962 }, direc: 0, match: 91, pcolor: 6, pnr: "AT664T", ptype: 1 }, src: 2, stime: 20150530141620, vdid: "", vsid: 421 } }
      2015-05-30T14:31:54.232+0800 [repl writer worker 1] Fatal Assertion 16360
      2015-05-30T14:31:54.232+0800 [repl writer worker 1] 
      
      ***aborting after fassert() failure
      

      after restart 10 times, we find logs :

      2015-05-30T16:34:45.752+0800 [rsBackgroundSync] replSet syncing to: BH-DCMN:8888
      2015-05-30T16:34:45.754+0800 [rsBackgroundSync] replset setting syncSourceFeedback to BH-DCMN:8888
      2015-05-30T16:34:45.766+0800 [rsSync] replSet still syncing, not yet to minValid optime 556955bb:6
      2015-05-30T16:34:45.922+0800 [repl writer worker 1] ERROR: writer worker caught exception:  :: caused by :: 13440 bad offset:0 accessing file: /sre_storage/data1/mongo/omm_spc.0. See http://dochub.mongodb.org/core/data-recovery on: { ts: Timestamp 1432966583000|8, h: 6965683350597498871, v: 2, op: "i", ns: "omm_spc.omm_case_file", o: { _id: ObjectId('556955b7498ee6b560cd1c61'), _fileId: "aceaa504-9867-450b-a9f0-8dcf23d29fd8", _len: 449594, _host: [ { _m_id: "6", _s_inx: 4 } ], _objectType: "RESOURCE_TYPE", fdescr: "", _ctime: new Date(1432966583201), online: false, longi: "117.30741844508651", timestamp: 0, cmrId: 4655, height: 0, offset: "3728:42", vhflag: 0, _acls: [], msStatus: 1, _mtime: new Date(1432966583201), _resourceLastRead: new Date(1432966583201), vendor: 0, fps: 0, width: 0, fcount: 0, _musr: "admin", vusrId: 1, cmrdr: "", srvId: 2, oldSrvId: 0, lati: "31.875971894223834", ftype: 1, _cusr: "admin", _poid: "542af3e6ed501b43fe57e9e2", _utime: new Date(1432966583201), _name: "32_20150530141621_269.jpg" } }
      2015-05-30T16:34:45.922+0800 [repl writer worker 1] Fatal Assertion 16360
      2015-05-30T16:34:45.922+0800 [repl writer worker 1] 
      
      ***aborting after fassert() failure
      

      after 30 minutes, a secondary power loss, then restart, we find logs :

      2015-05-30T16:19:49.538+0800 [rsBackgroundSync] replSet syncing to: BH-DCMN:8888
      2015-05-30T16:19:49.539+0800 [rsSync] replSet still syncing, not yet to minValid optime 55695d7d:61
      2015-05-30T16:19:49.582+0800 [rsBackgroundSync] replset setting syncSourceFeedback to BH-DCMN:8888
      2015-05-30T16:19:49.640+0800 [repl writer worker 1] omm_spc.vcm_alarm_files  caught assertion addKeysToIndex omm_spc.vcm_alarm_files.$alarm_1_id: ObjectId('55695d7d498e2fa3545a614f')
      2015-05-30T16:19:49.641+0800 [repl writer worker 1] ERROR: writer worker caught exception:  :: caused by :: 10287 insertDocument :: caused by :: 10287 btree: key+recloc already in index on: { ts: Timestamp 1432968573000|97, h: 7186593903883142160, v: 2, op: "i", ns: "omm_spc.vcm_alarm_files", o: { _id: ObjectId('55695d7d498e2fa3545a614f'), imgid: ObjectId('55695d7d498ef202c3e00044'), alarm: "076f9c44069811e582b578d752971959006024", dflag: 0 } }
      2015-05-30T16:19:49.641+0800 [repl writer worker 1] Fatal Assertion 16360
      2015-05-30T16:19:49.641+0800 [repl writer worker 1] 
      
      ***aborting after fassert() failure
      

      finally, we clean up the dbpath of two members, and copy data from the other, this issue resolved.
      However, this process takes three hours, and our business has been cut off, because we have a large amount of data.
      Is there any other way to solve or evade the problem more quickly?

            Assignee:
            sam.kleinman Sam Kleinman (Inactive)
            Reporter:
            rujun rujun
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: