Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32871

ReplicaSetMonitorRemoved and ShardNotFound errors on fanout query after removing a shard

    XMLWordPrintableJSON

Details

    • Fully Compatible
    • ALL
    • v4.2, v4.0
    • Hide
      • Shard a mongo collection, test.test, across shard1, shard2, etc...
      • Make sure that no queries/inserts/... to test.test occur during/after the shard removal
      • On mongos1, {removeShard: "shard1"}
      • Wait until the removal is complete (i.e. removeShard indicates the removal is complete)
      • On each of the mongos's, call db.test.count()
      Show
      Shard a mongo collection, test.test, across shard1, shard2, etc... Make sure that no queries/inserts/... to test.test occur during/after the shard removal On mongos1, {removeShard: "shard1"} Wait until the removal is complete (i.e. removeShard indicates the removal is complete) On each of the mongos's, call db.test.count()
    • Sharding 2019-09-09

    Description

      We've noticed that after removing a shard, fanout queries (e.g. issue a collection count against a sharded collection) will return ReplicaSetMonitorRemoved or ShardNotFound errors. While investigating, it looks like the internal chunk cache has an old config (getShardVersion on the collection returns an old version). It appears that as long as no non-fanout queries (or inserts/removes) are issued after the remove has completed, fanout queries on some mongos have a relatively high chance of consistently failing.

      Attachments

        1. logs.txt
          284 kB
        2. logs-3.6.txt
          474 kB
        3. remove4.js
          3 kB
        4. remove4-3.6.js
          3 kB

        Issue Links

          Activity

            People

              matthew.saltz@mongodb.com Matthew Saltz (Inactive)
              bartle David Bartley
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: