|
After some investigation, the problem seems to be in the concurrency framework, not the server. The concurrency multi statement transaction suites run existing fsm state functions inside transactions using withTxnAndAutoRetry to retry the entire function on a transient error. To avoid prematurely modifying thread local state, a copy of a thread's data object (bound to the state function as this) is used and swapped with the real data after the function completes. The bug is that the data variables are swapped after the state function finishes but before the transaction it ran inside commits. If the commit fails with a transient error, the entire state function will be retried, but the data will have already been modified, possibly leading to failures in workloads that assert on its values, like the indexed_insert*.js workloads.
In between the commit of the successful evergreen run in the description and master, SERVER-37884 was committed, which significantly slowed down two phase commit and led to more transient failures committing transactions, exposing this bug.
|