The current mergeChunks path is very inefficient because:
- It performs three sequential refreshes (router, shard-pre-merge and shard-post-merge)
- It performs sequential scan of the merge bounds on the cached routing info on the shard, only to generate a config server command with size proportional to the number of chunks being merged (which theoretically can exceed the max BSON size)
- The config server repeats the chunks scan that the shard did, this time directly against config.chunks, just to check that all the bounds that the shard sent match.
All this makes the mergeChunks command very expensive both from latency and from impact on the config server points of view.
It would be much better if:
- The router command:
- Didn't do a refresh on entry, but relied on the cached information and the shardVersion (this has backwards compatibility implications)
- The shard command (in order of importance):
- Instead of sending all the chunks which fall within a certain range to be merged, just send the ends of the range so that the size of the command to the config server is constant and the scan is done just once
- Just checked the major shardVersion (for routing correctness, i.e., to make sure this shard owns that range)
- Only did a refresh on the shard if the chunk bounds that the router sent didn't match with the cached info (this can only happen if a previous merge committed against the ConfigServer, but failed to refresh)