Background: Updates to the config.transactions table don't generate oplog entries and are replicated differently in the secondaries. The primary store all the relevant information in the oplog of the write that would update the config.transactions table and the secondaries reconstruct the table from this. Because of how oplog application is parallelized, the order it gets applied cannot be guaranteed. Fortunately, there is a simple rule that is used: higher transaction number wins, and if tied, higher lastWriteOpTime wins. So as an optimization, the secondary simply squash all changes to the same session to a single update and apply them at the end of the batch.
So the issue is when someone (like the TransactionReaper) deletes an entry in config.transactions, it will generate an oplog entry for the delete. When the secondary applies this delete oplog, the transaction is correctly deleted. But if there are updates on the transactions table for the same oplog batch, then it can "revive" back again, creating an orphan entry and making it inconsistent with the current primary.
Note: that likelihood of this happening is low since the reaper only cleans up entries that are not active for more than 30 min.
- related to
SERVER-34651 Performance regression on secondary application with retryable batched writes
SERVER-33343 Ensure that transaction table is maintained on secondaries when transactions commit