ISSUE DESCRIPTION AND IMPACT
A bug in the algorithm to do page splitting in the WiredTiger storage engine may trigger a segmentation fault, causing a node to shut down defensively to protect user data.
This bug only affects nodes running the WiredTiger storage engine. MMAPv1 nodes can't be affected by this bug.
DIAGNOSIS AND AFFECTED VERSIONS
The bug manifests itself with a message in the logs similar to the one below:
2017-06-23T19:03:29.043+0000 F - [thread1] Invalid access at address: 0x78 2017-06-23T19:03:29.073+0000 F - [thread1] Got signal: 11 (Segmentation fault). ----- BEGIN BACKTRACE ----- [...] mongod(+0x160C2BB) [0x1a0c2bb] mongod(__wt_split_multi+0x85) [0x1a105e5] mongod(__wt_evict+0xA55) [0x1a5eac5] [...] ----- END BACKTRACE -----
This bug is present in all production versions of MongoDB that run with the WiredTiger storage engine. However, it is very unlikely to hit this bug in versions older than (and including) 3.2.13 and 3.4.4, as well as MongoDB 3.0.
Users running MongoDB 3.2.14 or 3.2.5 are more likely to trigger this bug.
REMEDIATION AND WORKAROUNDS
Users affected by this bug running 3.2.14 or 3.4.5 can downgrade to 3.2.13 or 3.4.4 to lower the probability of hitting this bug.
A fix for this issue will be included in the 3.2.15 and 3.4.6 production releases.
Original description
The failure was first seen in Jenkins automated testing running the Python test_compact02 test case.
A description of the likely failure mode can be seen below.
Build failed with the following message:
[2017/06/17 03:40:31.282] test_compact02.test_compact02.test_compact02(table.1mb.8KB) (subunit.RemotedTestCase)
[2017/06/17 03:40:31.282] test_compact02.test_compact02.test_compact02(table.1mb.8KB) ... ERROR
[2017/06/17 03:46:18.368] ======================================================================
[2017/06/17 03:46:18.368] ERROR: test_compact02.test_compact02.test_compact02(table.1mb.8KB) (subunit.RemotedTestCase)
[2017/06/17 03:46:18.368] test_compact02.test_compact02.test_compact02(table.1mb.8KB)
[2017/06/17 03:46:18.368] ----------------------------------------------------------------------
[2017/06/17 03:46:18.368] _StringException: lost connection during test 'test_compact02.test_compact02.test_compact02(table.1mb.8KB)'
[2017/06/17 03:46:18.368] ----------------------------------------------------------------------
Checking the contents of the artifacts showed no indications of what caused the crash/failure.
- is duplicated by
-
SERVER-29850 Access violation due to a bug in internal page splitting in WiredTiger
- Closed
- related to
-
WT-3389 Restructure split code to hold a split generation for the entire operation.
- Closed
- links to