Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48033

remove shard failed: move chunk abort

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 4.0.10
    • Sharding
    • None
    • ALL

    Description

      The second time call `RemoveShard` failed: move chunk abort.

      There're 4 shards in my sharding cluster: shard1, shard2, shard3, shard4. And there's several sharded collection distributed on these shards.

      At first, I remove shard2 successfully. Then, I call `removeShard` to remove shard4 but failed with `sh.status()` return always "draining" status.

      After I went through the shard4 log I found the error: "Chunk move failed :: caused by :: ShardNotFound: Shard2 not found". So I think the cache-route hasn't been updated since shard2 already removed.

      It can be reproduced when I run "_recvChunkStart" command on shard4, I attached the picture on the attachment.

      Attachments

        1. _recvChunkStart_failed.png
          _recvChunkStart_failed.png
          762 kB
        2. mongod.log.2020-04-30T00-15-05.tar.gz
          7.37 MB
        3. move_chunk_fail.png
          move_chunk_fail.png
          924 kB
        4. screenshot-1.png
          screenshot-1.png
          2.31 MB
        5. sh.status.png
          sh.status.png
          314 kB

        Activity

          People

            carl.champain@mongodb.com Carl Champain (Inactive)
            cvinllen@gmail.com vinllen chen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: