During an exchange, we have the possibility of one thread checking out the session catalog state for the session, then blocking while waiting for another thread which is part of the same session to perform work. Because all the consumers of the exchange would be in the same session, that thread which the original thread is waiting on would be unable to proceed because it first needs to check out the session catalog state, which the original state is holding.
For example, consider an exchange with two consumers opened within a session. A single mongod will have two cursors (ID X and ID Y) open which are pulling data from the same Exchange. If there are two active getMores running, one for X and one for Y, only one can hold the session state at a time, so only one can proceed into the exchange at once. Imagine the getMore for X wins the race and checks out the session, but then finds that the buffer for its part of the exchange output is empty. This thread will then iterate the input stage to the Exchange in an attempt to fill up it's buffer. However, it might find that all the subsequent results should go into the buffer for the consumer feeding into cursor Y, and that buffer is full. In this case, the thread for the getMore on X has to wait until the consumer for cursor Y consumes the results before it can proceed. Of course, that getMore cannot proceed because it first needs to check out the session state - thus we have a deadlock.
Two possible solutions we have thought of thus far:
1. Once the thread on cursor id X begins to wait for another thread to consume results, it should check back in its session state. Only once it has been signaled to proceed should it re-acquire the session state.
2. We should somehow set it up so that threads which will consume output of an exchange do not check out the session catalog state (we don't think they will need it since they do not interact with the storage engine, and further such operations would always be banned from operating within a transaction). Only when the thread has been designated to generate input and partition it or otherwise distribute it among the buffers should it actually check out the session state.
Note we have not observed such a scenario before and also that there may be other possible remedies. This is very related to the issue described in