|
I'm closing this in favor of SERVER-42922 (though I arguably could have rewritten the contents of this ticket). For your convenience, I've linked PYTHON-1948 as depending on this new ticket, though there might be workarounds available.
What's being observed is a general server bug regarding the use of failCommand with closeConnection. It attempts to close the client connection, but the server can also run command code internally (DBDirectClient) which doesn't have an actual network interface to close. What you're seeing is a null dereference here.
I suspect why this is only showing for you on mmapv1 tests is that 4.0 mmapv1 has bugs which fail the retryable reads tests. My guess is the python suite is leaving the mongod in a state where the failpoint is still engaged. When a background thread that uses DBDirectClient runs an iteration, it encounters the failpoint (that was not intended for it), crashing the mongod. My hypothesis is that the crash is only observed after a python test has already failed.
One thing worth trying is having the tests disable any outstanding failpoints on teardown. Technically there's still a race where the background thread can observe this failpoint state, even in a passing test. But it seems the window to hit the race is sufficiently small given you haven't observed this with WT (which I assume always passes the suite).
|