There is a batch per-op, which is very slow for large number of ops. Correct behavior is to figure out how many of the next ops can go to the same shard, then send all those in a batch. (Ops that go to multiple shards must be sent on their own in ordered mode.)
Current workaround is to use unordered batches for performance testing.
- is duplicated by
SERVER-10723 Bulk insert is slow in sharded environment
SERVER-12787 Strange performance characteristics of write ops in shell