Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29293

Recipient shard fails to abort migration on stepdown

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.4.11, 3.5.11
    • Affects Version/s: 3.4.4, 3.5.10
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v3.4
    • Hide

      with no obvious steps to recur.
      I think the problem appeared during migration.

      Show
      with no obvious steps to recur. I think the problem appeared during migration.

      condition:
      mongodb cluster based on version 3.4.4
      3 shards with 2 replication sets
      appearance:
      shard3 primary和secondary mongod process aborted exceptionally.

      2017-05-18T23:28:26.240+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.11.123.241:22003
      2017-05-18T23:28:26.243+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 10.11.123.241:22003 - HostUnreachable: Connection refused
      2017-05-18T23:28:26.243+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 10.11.123.241:22003 due to failed operation on a connection
      2017-05-18T23:28:26.244+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 10.11.123.241:22003; HostUnreachable: Connection refused
      2017-05-18T23:28:26.244+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to 10.11.123.241:22003
      2017-05-18T23:28:26.245+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to 10.11.123.241:22003 - HostUnreachable: Connection refused
      2017-05-18T23:28:26.245+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to 10.11.123.241:22003 due to failed operation on a connection
      2017-05-18T23:28:26.246+0800 I REPL     [ReplicationExecutor] Error in heartbeat request to 10.11.123.241:22003; HostUnreachable: Connection refused
      2017-05-18T23:28:26.252+0800 I SHARDING [migrateThread] about to log metadata event into changelog: { _id: "Juyuan202-01-2017-05-18T23:28:26.251+0800-591dbd9a22d91a38ec6bb40f", server: "Juyuan202-01", clientAddr: "", time: new Date(1495121306251), what: "moveChunk.to", ns: "SafetyAnalysis.dns.record", details: { min: { _id: "4bfb5160c15087" }, max: { _id: "4bfb52c2b2ea8a" }, step 1 of 6: 32, step 2 of 6: 2, step 3 of 6: 10026, step 4 of 6: 0, note: "aborted", errmsg: "Cannot go to critical section because secondaries cannot keep up" } }
      2017-05-18T23:28:26.266+0800 I -        [migrateThread] Invariant failure it != _receivingChunks.end() src/mongo/db/s/metadata_manager.cpp 234
      2017-05-18T23:28:26.273+0800 I -        [migrateThread]
      
      ***aborting after invariant() failure
      
      
      2017-05-18T23:28:26.325+0800 F -        [migrateThread] Got signal: 6 (Aborted).
      
       0x7f824a9ade61 0x7f824a9acf59 0x7f824a9ad43d 0x7f824810a7e0 0x7f8247d995e5 0x7f8247d9adc5 0x7f8249c41a12 0x7f824a51c9d8 0x7f824a52dabc 0x7f824a533e02 0x7f824a534550 0x7f824b41ee10 0x7f8248102aa1 0x7f8247e4faad
      ----- BEGIN BACKTRACE -----
      
      ***aborting after invariant() failure
      
      2017-05-18T23:28:26.266+0800 I -        [migrateThread] Invariant failure it != _receivingChunks.end() src/mongo/db/s/metadata_manager.cpp 234
      2017-05-18T23:28:26.273+0800 I -        [migrateThread]
      
      ***aborting after invariant() failure
      
      
      2017-05-18T23:28:26.325+0800 F -        [migrateThread] Got signal: 6 (Aborted).
      
      mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7f824a9ade61]
       mongod(+0x157CF59) [0x7f824a9acf59]
       mongod(+0x157D43D) [0x7f824a9ad43d]
       libpthread.so.0(+0xF7E0) [0x7f824810a7e0]
       libc.so.6(gsignal+0x35) [0x7f8247d995e5]
       libc.so.6(abort+0x175) [0x7f8247d9adc5]
       mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x7f8249c41a12]
       mongod(_ZN5mongo15MetadataManager13forgetReceiveERKNS_10ChunkRangeE+0x2E8) [0x7f824a51c9d8]
       mongod(_ZN5mongo27MigrationDestinationManager14_forgetPendingEPNS_16OperationContextERKNS_15NamespaceStringERKNS_7BSONObjES8_RKNS_3OIDE+0x36C) [0x7f824a52dabc]
       mongod(_ZN5mongo27MigrationDestinationManager14_migrateThreadENS_7BSONObjES1_S1_NS_16ConnectionStringENS_3OIDENS_19WriteConcernOptionsE+0x172) [0x7f824a533e02]
       mongod(+0x1104550) [0x7f824a534550]
       mongod(+0x1FEEE10) [0x7f824b41ee10]
      libpthread.so.0(+0x7AA1) [0x7f8248102aa1]
       libc.so.6(clone+0x6D) [0x7f8247e4faad]
      -----  END BACKTRACE  -----
      

            Assignee:
            nathan.myers Nathan Myers
            Reporter:
            wayne80 Wayne Wang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: