There is a divergence around how we deal with setting the multikey flag on primary versus secondary. On primary, we go through this codepath, and we register an onCommit handler on the OperationContext to switch the in-memory multikey bit to true, which means we won't set it until the prepared transaction commits. So, that allows subsequent writes to try updating the multikey flag again, since they see that the bit is not set. These writes would hit a prepare conflict on the collection catalog entry document and block, which is OK. On secondary, however, when we apply a prepared transaction, we appear to set the in-memory multikey bit immediately, instead of waiting for the prepared transaction to commit, since the OperationContext we use here doesn't appear to be correlated to the actual transaction we applied and the WUOW commits right away. Because of this behavior divergence, the following scenario, with two nodes, n0 and n1, is possible:
- A primary node (n0) prepares a transaction that did an insert that set the multikey flag. It does not update the in-memory multikey bit since the transaction has not committed yet.
- The secondary (n1), applies the prepare operation that the primary just received. It does set the in-memory multikey bit because of the behavior outlined above.
- Node n1 steps up and becomes primary, n0 steps down and becomes secondary.
- The new primary, n1, receives an insert operation that tries to set the multikey flag again. Because the in-memory bit has already been set, it does not try to write the flag again, and doesn't encounter a prepare conflict.
- The current secondary, n0, tries to apply the insert operation that n1 just executed. Since n0 did not previously set the in-memory multikey bit, it tries to write the flag again, but it will hit a prepare conflict, since the transaction started in step (1) is still prepared. The secondary is now stuck in oplog application, unable to proceed.
The attached repro ( multikey_prepare_txns.js ) demonstrates this scenario. One possible suggestion to fix this issue is to do the multikey update in a side transaction on primary, so that it would not generate prepare conflicts at all. It should generally be safe to set the multikey flag on a collection earlier than is necessary, and it likely doesn't need to be atomic with respect to the original transaction that did the write.