Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3373

Access violation due to a bug in internal page splitting

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.3, 3.2.15, 3.4.6, 3.5.10
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage 2017-07-10

      Issue Status as of Jun 26, 2017

      ISSUE DESCRIPTION AND IMPACT
      A bug in the algorithm to do page splitting in the WiredTiger storage engine may trigger a segmentation fault, causing a node to shut down defensively to protect user data.

      This bug only affects nodes running the WiredTiger storage engine. MMAPv1 nodes can't be affected by this bug.

      DIAGNOSIS AND AFFECTED VERSIONS
      The bug manifests itself with a message in the logs similar to the one below:

      2017-06-23T19:03:29.043+0000 F -        [thread1] Invalid access at address: 0x78
      2017-06-23T19:03:29.073+0000 F -        [thread1] Got signal: 11 (Segmentation fault).
      
      ----- BEGIN BACKTRACE -----
      [...]
       mongod(+0x160C2BB) [0x1a0c2bb]
       mongod(__wt_split_multi+0x85) [0x1a105e5]
       mongod(__wt_evict+0xA55) [0x1a5eac5]
      [...]
      -----  END BACKTRACE  -----
      

      This bug is present in all production versions of MongoDB that run with the WiredTiger storage engine. However, it is very unlikely to hit this bug in versions older than (and including) 3.2.13 and 3.4.4, as well as MongoDB 3.0.

      Users running MongoDB 3.2.14 or 3.2.5 are more likely to trigger this bug.

      REMEDIATION AND WORKAROUNDS
      Users affected by this bug running 3.2.14 or 3.4.5 can downgrade to 3.2.13 or 3.4.4 to lower the probability of hitting this bug.

      A fix for this issue will be included in the 3.2.15 and 3.4.6 production releases.

      Original description

      The failure was first seen in Jenkins automated testing running the Python test_compact02 test case.

      A description of the likely failure mode can be seen below.

      Test Run
      https://evergreen.mongodb.com/task/wiredtiger_ubuntu1404_unit_test_c455dcfd99c4311838a194df917b63ceb61876f3_17_06_16_17_09_07#/log/S

      Artifacts
      https://s3.amazonaws.com/build_external/wiredtiger/ubuntu1404/c455dcfd99c4311838a194df917b63ceb61876f3/artifacts/wiredtiger_ubuntu1404_c455dcfd99c4311838a194df917b63ceb61876f3_17_06_16_17_09_07.tgz

      Build failed with the following message:

       [2017/06/17 03:40:31.282] test_compact02.test_compact02.test_compact02(table.1mb.8KB) (subunit.RemotedTestCase)
       [2017/06/17 03:40:31.282] test_compact02.test_compact02.test_compact02(table.1mb.8KB) ... ERROR
       [2017/06/17 03:46:18.368] ======================================================================
       [2017/06/17 03:46:18.368] ERROR: test_compact02.test_compact02.test_compact02(table.1mb.8KB) (subunit.RemotedTestCase)
       [2017/06/17 03:46:18.368] test_compact02.test_compact02.test_compact02(table.1mb.8KB)
       [2017/06/17 03:46:18.368] ----------------------------------------------------------------------
       [2017/06/17 03:46:18.368] _StringException: lost connection during test 'test_compact02.test_compact02.test_compact02(table.1mb.8KB)'
       [2017/06/17 03:46:18.368] ----------------------------------------------------------------------
      

      Checking the contents of the artifacts showed no indications of what caused the crash/failure.

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            david.hows David Hows
            Votes:
            0 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated:
              Resolved: