We had very confusing behavior where a collection was reporting one set of documents some of the time, and other results at other times. We deduced (and verified) that this was because non-sharded collections (in a sharded environment) were being accessed on two different shards. What appears to have happened is that at least one of our mongos did not get the movePrimary message, meaning it still believed (and interacted with) data on the old shard.
Now, in our situation, we admissibly committed a faux pas: we ran movePrimary while a shard was draining. I realize that the web docs explicitly state not to do this, but it happened.
It seems that movePrimary either:
A) Doesn't move collections atomically
B) Isn't very forceful about having mongos update their routing tables
C) Doesn't play nicely at all with the balancer when a shard is draining
Or something else I suppose. Either way, it seems reasonable that movePrimary will raise an error (rather than creating inconsistencies) if it truly needs to be ran only when all chunks are moved off of a shard. If it does not actually need that, then there clearly is a bug somewhere that is leading to very confusing and inconsistent errors.
- depends on
-
SERVER-939 Ability to distribute collections in a single db
- Closed
- related to
-
SERVER-8059 After movePrimary, db.getCollectionNames() excludes previously existing one-chunk collections
- Closed