- The AsyncRequestsSender holds a lock during construction and work scheduling.
- This lock prevents callbacks from running if their response comes back during scheduling.
- Scheduling can take a long time (up to 20 seconds per shard) if a read preference cannot be satisfied. This is done by a blocking call into the ReplicaSetMonitor
The bad sequence of events is:
- Scatter gather request to two shards is dispatched
- The first host suceeds in targetting and runs
- The second host cannot satisfy it's read pref, blocking holding a lock
- The first request suceeds, blocking in running _handleResponse
If you have enough of those, you can saturate all background networking workers, making your mongos completely unresponsive until targeting can succeed.