Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-15798

Helpers::removeRange does not check if node is primary

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 2.8.0-rc4
    • Affects Version/s: 2.6.4, 2.8.0-rc0
    • Component/s: Replication, Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide

      4x server containing 2 replica sets, running Amazon Linux (~Centos6) with all running 2.6.4:

      [root@ip-10-60-18-179 mongo]# rpm -qa | grep mongo
      mongodb-org-mongos-2.6.4-1.x86_64
      mongodb-org-2.6.4-1.x86_64
      mongodb-org-shell-2.6.4-1.x86_64
      mongodb-org-tools-2.6.4-1.x86_64
      mongodb-org-server-2.6.4-1.x86_64

      Show
      4x server containing 2 replica sets, running Amazon Linux (~Centos6) with all running 2.6.4: [root@ip-10-60-18-179 mongo] # rpm -qa | grep mongo mongodb-org-mongos-2.6.4-1.x86_64 mongodb-org-2.6.4-1.x86_64 mongodb-org-shell-2.6.4-1.x86_64 mongodb-org-tools-2.6.4-1.x86_64 mongodb-org-server-2.6.4-1.x86_64

      Original title:

      rs.stepDown during migration causes fassert in logOp

      Original description:

      At 3am cron issues an rs.stepDown() as part of weekly compaction - having read SERVER-14186 I assume the migrate chunk errors are not relevant, but I've left them in the logs in case

      2014-10-04T03:04:57.225+0000 [migrateThread] doc_metadata.metadata  caught assertion addKeysToIndex doc_metadata.metadata.$url_1_sourceKey_1_id: ObjectId('519239f8e4b0be945d6b9fe6')
      2014-10-04T03:04:57.225+0000 [migrateThread] scoped connection to replica_set1/10.40.131.234:27018,10.40.213.215:27018 not being returned to the pool
      2014-10-04T03:04:57.225+0000 [migrateThread] about to log metadata event: { _id: "ip-10-60-18-179-2014-10-04T03:04:57-542f63d9417de929885b7e2e", server: "ip-10-60-18-179", clientAddr: ":27017", time: new Date(1412391897225), what: "moveChunk.to", ns: "doc_metadata.metadata", details: { min: { _id: ObjectId('5191d4f3e4b07927e24302bc') }, max: { _id: ObjectId('51926d79e4b056bb2e3823fc') }, step 1 of 5: 2, step 2 of 5: 5296, note: "aborted" } }
      2014-10-04T03:04:57.258+0000 [migrateThread] ERROR: migrate failed: insertDocument :: caused by :: 17280 Btree::insert: key too large to index, failing doc_metadata.metadata.$url_1_sourceKey_1 1163 { : "http://www.shippingonline.cn/Yellowpages/world_list.asp?page=164&user_addr='''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''...", : "www.google.com.search.64.84." }
      2014-10-04T03:05:01.748+0000 [conn528628] end connection 10.226.114.59:33504 (272 connections now open)
      2014-10-04T03:05:01.748+0000 [initandlisten] connection accepted from 10.226.114.59:33525 #528632 (273 connections now open)
      2014-10-04T03:05:01.887+0000 [conn478926] remotely refreshing metadata for doc_metadata.metadata based on current shard version 660|1||53da748318304c3808174ba3, current metadata version is 660|7||53da748318304c3808174ba3
      2014-10-04T03:05:01.889+0000 [conn478926] metadata of collection doc_metadata.metadata already up to date (shard version : 660|1||53da748318304c3808174ba3, took 1ms)
      2014-10-04T03:05:01.903+0000 [migrateThread] starting receiving-end of migration of chunk { _id: ObjectId('5191d4f3e4b07927e24302bc') } -> { _id: ObjectId('51926d79e4b056bb2e3823fc') } for collection doc_metadata.metadata from replica_set1/10.40.131.234:27018,10.40.213.215:27018 at epoch 53da748318304c3808174ba3
      2014-10-04T03:05:01.985+0000 [conn528629] end connection 10.245.78.137:48391 (272 connections now open)
      2014-10-04T03:05:01.987+0000 [initandlisten] connection accepted from 10.245.78.137:48402 #528633 (273 connections now open)
      2014-10-04T03:05:05.819+0000 [initandlisten] connection accepted from 127.0.0.1:37058 #528634 (274 connections now open)
      2014-10-04T03:05:05.824+0000 [conn528634] end connection 127.0.0.1:37058 (273 connections now open)
      2014-10-04T03:05:05.900+0000 [initandlisten] connection accepted from 127.0.0.1:37059 #528635 (274 connections now open)
      2014-10-04T03:05:05.929+0000 [conn528635] 10.226.114.59:27019 is now closest at 542f63e1:37
      2014-10-04T03:05:05.929+0000 [conn528635] replSet info stepping down as primary secs=60
      2014-10-04T03:05:05.939+0000 [conn528635] replSet relinquishing primary state
      2014-10-04T03:05:05.939+0000 [conn528635] replSet SECONDARY
      
      
      (lots of)
      2014-10-04T03:05:05.940+0000 [conn478632] end connection 10.40.213.215:59486 (273 connections now open)
      
      
      2014-10-04T03:05:05.951+0000 [migrateThread] replSet error : logOp() but not primary
      2014-10-04T03:05:05.951+0000 [migrateThread] doc_metadata.metadata Fatal Assertion 17405
      2014-10-04T03:05:06.000+0000 [initandlisten] connection accepted from 10.40.131.234:33182 #528636 (17 connections now open)
      2014-10-04T03:05:06.008+0000 [initandlisten] connection accepted from 127.0.0.1:37060 #528637 (18 connections now open)
      2014-10-04T03:05:06.067+0000 [conn528635] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:37059]
      2014-10-04T03:05:06.081+0000 [initandlisten] connection accepted from 10.226.114.59:33553 #528638 (18 connections now open)
      2014-10-04T03:05:06.126+0000 [migrateThread] doc_metadata.metadata 0x11e6111 0x1187e49 0x116a97d 0xe577aa 0xe5000e 0xa4d139 0xfd5594 0xfda178 0xfc5839 0x122aaa9 0x7f72b0715851 0x7f72afab811d
       mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
       mongod(_ZN5mongo10logContextEPKc+0x159) [0x1187e49]
       mongod(_ZN5mongo13fassertFailedEi+0xcd) [0x116a97d]
       mongod() [0xe577aa]
       mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_PbbPS3_+0xee) [0xe5000e]
       mongod(_ZN5mongo7Helpers11removeRangeERKNS_8KeyRangeEbbPNS0_11RemoveSaverEbb+0xb09) [0xa4d139]
       mongod(_ZN5mongo13MigrateStatus3_goEv+0x1644) [0xfd5594]
       mongod(_ZN5mongo13MigrateStatus2goEv+0x28) [0xfda178]
       mongod(_ZN5mongo13migrateThreadEv+0x59) [0xfc5839]
       mongod() [0x122aaa9]
       /lib64/libpthread.so.0(+0x7851) [0x7f72b0715851]
       /lib64/libc.so.6(clone+0x6d) [0x7f72afab811d]
      2014-10-04T03:05:06.126+0000 [migrateThread]
      
      ***aborting after fassert() failure
      
      
      2014-10-04T03:05:06.133+0000 [migrateThread] SEVERE: Got signal: 6 (Aborted).
      Backtrace:0x11e6111 0x11e54ee 0x7f72afa02920 0x7f72afa028a5 0x7f72afa04085 0x116a9ea 0xe577aa 0xe5000e 0xa4d139 0xfd5594 0xfda178 0xfc5839 0x122aaa9 0x7f72b0715851 0x7f72afab811d
       mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
       mongod() [0x11e54ee]
       /lib64/libc.so.6(+0x32920) [0x7f72afa02920]
       /lib64/libc.so.6(gsignal+0x35) [0x7f72afa028a5]
       /lib64/libc.so.6(abort+0x175) [0x7f72afa04085]
       mongod(_ZN5mongo13fassertFailedEi+0x13a) [0x116a9ea]
       mongod() [0xe577aa]
       mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_PbbPS3_+0xee) [0xe5000e]
       mongod(_ZN5mongo7Helpers11removeRangeERKNS_8KeyRangeEbbPNS0_11RemoveSaverEbb+0xb09) [0xa4d139]
       mongod(_ZN5mongo13MigrateStatus3_goEv+0x1644) [0xfd5594]
       mongod(_ZN5mongo13MigrateStatus2goEv+0x28) [0xfda178]
       mongod(_ZN5mongo13migrateThreadEv+0x59) [0xfc5839]
       mongod() [0x122aaa9]
       /lib64/libpthread.so.0(+0x7851) [0x7f72b0715851]
       /lib64/libc.so.6(clone+0x6d) [0x7f72afab811d]
      

      (Other issue - the primary node was running at 200% repeating the same log messages over and over again 10x per second:

      2014-10-24T13:02:10.289+0000 [rsHealthPoll] warning: Failed to connect to 10.60.18.179:27019, reason: errno:111 Connection refused
      2014-10-24T13:02:10.289+0000 [rsHealthPoll] couldn't connect to 10.60.18.179:27019: couldn't connect to server 10.60.18.179:27019 (10.60.18.179) failed, connection attempt failed
      

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            apiggott@ikanow.com Alex Piggott
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: