Due to the issue described in SERVER-31398 with the movePrimary command getting interrupted, the mongos_rs_shard_failure_tolerance.js test may fail in the sharding_continuous_config_stepdown.yml test suite. The test assumes that collUnsharded lives on shard #0 after it terminates the primaries of shards #1 and #2; however, it doesn't actually call assert.commandWorked() and instead just logs the server's response.
// Create the unsharded database assert.writeOK(collUnsharded.insert({some: "doc"})); assert.writeOK(collUnsharded.remove({})); printjson( admin.runCommand({movePrimary: collUnsharded.getDB().toString(), to: st.shard0.shardName})); // Create the sharded database assert.commandWorked(admin.runCommand({enableSharding: collSharded.getDB().toString()})); printjson( admin.runCommand({movePrimary: collSharded.getDB().toString(), to: st.shard0.shardName})); assert.commandWorked( admin.runCommand({shardCollection: collSharded.toString(), key: {_id: 1}})); assert.commandWorked(admin.runCommand({split: collSharded.toString(), middle: {_id: 0}})); assert.commandWorked(admin.runCommand( {moveChunk: collSharded.toString(), find: {_id: 0}, to: st.shard1.shardName}));
This makes the failure mode much less obvious as to its root cause as it simply manifests as mongos repeatedly trying to read from one of the other (downed) shard.