When testing sharding support (the test is described in http://jira.mongodb.org/browse/SHARDING-77) I noticed from the top that most of time is being spend in mongos instead of mongod what seems non-intuitive. On the first glance for simple queries mongos should doing next to nothing compared to mongod. I made a profile using google's perftool and it looks like a lot of time is being spend copying data what seems to be wrong as normally this type of application if correctly written should be spending time in IO (the profile in callgrind format is attached, it can be viewed with kcachegrind). I've looked into the code and it seems that there is a lot of waste coming from copying temporary objects again and again. Particularly use of ServerAndQuery objects inside ShardStrategy::queryOp() is very non-optimal. Even with simple refactoring by changing some constructors to pass data by const references instead of value (see the attached patch) I could speedup my test on 30% but there is a lot of room for improvement if ServerAndQuery copying eliminated completely.