Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11509

movePrimary should error when database is not drained

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.4.6
    • Component/s: Sharding
    • Labels:
    • Environment:
      Ubuntu servers
    • Sharding
    • ALL
    • Hide

      Unsure. We're spending a little time trying to make a reproduction script, but essentially something along the lines of:

      1. Have multiple mongos running (we have 12 or so) and several populated shards (we have 8), with some collections in a database sharded and some collections unsharded.

      2. Start draining the shard that is the current primary for the database.

      3. While it is draining, run movePrimary to another shard.

      4. Query each mongos separately, looking for inconsistent results.

      Show
      Unsure. We're spending a little time trying to make a reproduction script, but essentially something along the lines of: 1. Have multiple mongos running (we have 12 or so) and several populated shards (we have 8), with some collections in a database sharded and some collections unsharded. 2. Start draining the shard that is the current primary for the database. 3. While it is draining, run movePrimary to another shard. 4. Query each mongos separately, looking for inconsistent results.
    • Sharding 2018-11-19

      We had very confusing behavior where a collection was reporting one set of documents some of the time, and other results at other times. We deduced (and verified) that this was because non-sharded collections (in a sharded environment) were being accessed on two different shards. What appears to have happened is that at least one of our mongos did not get the movePrimary message, meaning it still believed (and interacted with) data on the old shard.

      Now, in our situation, we admissibly committed a faux pas: we ran movePrimary while a shard was draining. I realize that the web docs explicitly state not to do this, but it happened.

      It seems that movePrimary either:

      A) Doesn't move collections atomically
      B) Isn't very forceful about having mongos update their routing tables
      C) Doesn't play nicely at all with the balancer when a shard is draining

      Or something else I suppose. Either way, it seems reasonable that movePrimary will raise an error (rather than creating inconsistencies) if it truly needs to be ran only when all chunks are moved off of a shard. If it does not actually need that, then there clearly is a bug somewhere that is leading to very confusing and inconsistent errors.

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            shadowman131 Walt Woods
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: