In the case of an aggregation framework query the first batch of results asked from the mergerPart to the shards are done asynchronously, however when it needs to get a second batch to fullfill the query it is virtually mono-threaded as the call to getNext is synchronous.
I think it would be good to be able to overcome this limitation and make all subsequent fetches to be done in the background.
You can see here (https://github.com/ahom/jupyter-notebooks/blob/master/mongo_cs32044/notebook.ipynb) a representation of this behavior.
On the Y axis are the shards, on the X axis is the time. Starting from second batch seems is completely synchronous.