UBSAN testing in v4.4 reported a possible null dereference in __wt_txn_user_active(). This looks like the problematic code:
for (i = 0, session_in_list = conn->sessions; i < session_cnt; i++, session_in_list++) { /* Skip inactive sessions. */ if (!session_in_list->active) continue; /* Check if a user session has a running transaction. Ignore prepared transactions. */ if (F_ISSET(session_in_list->txn, WT_TXN_RUNNING) && !F_ISSET(session_in_list, WT_SESSION_INTERNAL) && !F_ISSET(session_in_list->txn, WT_TXN_PREPARE)) { txn_active = true; break; } }
UBSAN complains about the access to session_in_list->txn in the first use of the F_ISSET macro. I assume the danger is that we race with a thread that is closing a session. So in the code, above, session_in_list->active is true but by the time we start checking flags in the session's txn, it has been cleared.
This window is somewhat larger than it looks since __wt_session_close_internal() does a bunch of work between when it frees and clears its transaction and when it clears the active flag.
It is not clear whether MongoDB code would actually trigger this race or if UBSAN is simply reporting that it is possible.
Note that although the test failure was in v4.4, the same code and race is in the develop branch.
- is caused by
-
WT-7909 Create a new method to check for running user transactions before starting rollback-to-stable operation
- Closed
- is related to
-
WT-7983 Create a stress test to exercise opening and closing of sessions in parallel to rollback-to-stable
- Closed
- related to
-
WT-8543 Rollback-to-stable should detect a prepared transaction as an active transaction and fail
- Closed