Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.6
    • Component/s: WiredTiger
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      2015-08-17T03:25:32.776+0000 I NETWORK  [initandlisten] connection accepted from 10.20.0.164:45684 #56812 (20 connections now open)
      2015-08-17T03:25:40.707+0000 I NETWORK  [conn56799] end connection 10.200.1.61:38502 (19 connections now open)
      2015-08-17T03:25:40.741+0000 I NETWORK  [initandlisten] connection accepted from 10.200.1.61:38525 #56813 (20 connections now open)
      2015-08-17T03:25:45.821+0000 I NETWORK  [conn56800] end connection 10.20.0.170:37363 (19 connections now open)
      2015-08-17T03:25:45.824+0000 I NETWORK  [initandlisten] connection accepted from 10.20.0.170:37389 #56814 (20 connections now open)
      2015-08-17T03:25:58.477+0000 I NETWORK  [conn56802] end connection 10.10.0.169:45149 (19 connections now open)
      2015-08-17T03:25:58.478+0000 I NETWORK  [initandlisten] connection accepted from 10.10.0.169:45598 #56815 (20 connections now open)
      2015-08-17T03:26:02.793+0000 I NETWORK  [conn56812] end connection 10.20.0.164:45684 (19 connections now open)
      2015-08-17T03:26:02.794+0000 I NETWORK  [initandlisten] connection accepted from 10.20.0.164:45913 #56816 (20 connections now open)
      2015-08-17T03:26:10.901+0000 I NETWORK  [conn56813] end connection 10.200.1.61:38525 (19 connections now open)
      2015-08-17T03:26:10.912+0000 I NETWORK  [initandlisten] connection accepted from 10.200.1.61:38551 #56817 (20 connections now open)
      2015-08-17T03:26:12.839+0000 F -        Invalid access at address: 0
      2015-08-17T03:26:12.848+0000 F -        Got signal: 11 (Segmentation fault).
       
       0xf75569 0xf74be2 0xf74f3e 0x369ce0f710 0x136104c 0x369ce079d1 0x369cae88fd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B75569"},{"b":"400000","o":"B74BE2"},{"b":"400000","o":"B74F3E"},{"b":"369CE00000","o":"F710"},{"b":"400000","o":"F6104C"},{"b":"369CE00000","o":"79D1"},{"b":"369CA00000","o":"E88FD"}],"processInfo":{ "mongodbVersion" : "3.0.6-rc1-pre-", "gitVersion" : "7397477c45986e2a9ed195f912afee3ed9ffcbc7", "uname" : { "sysname" : "Linux", "release" : "2.6.32-504.23.4.el6.x86_64", "version" : "#1 SMP Tue Jun 9 20:57:37 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "77066BF13744F5BE02D2FD9E7C3776B7508C72BF" }, { "b" : "7FFFC42F3000", "elfType" : 3, "buildId" : "FF4CBAAE51A93124ED31C2B1386CE92FF24AEBC3" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "B8DFF8E53D9F2B80C3C382E83EC17C828B536A39" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "40BEA6554E64FC0C3D5C7D0CD91362730515102F" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "FC4EFD7502ACB3B9D213D28272D15A165857AD5A" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "583411D8786F86A1D6B8741C502831E6122445A7" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "454F8FC6CC6502C6401E5F9E221564D80665D277" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "F07F2E7CF4BFB393CC9BBE8CDC6463652E14DB07" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "C9A87F6A29ED1D3CB18F539845A45FE3A9877FF1" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "246C3BAB0AB093AFD59D34C8CBF29E786DE4BE97" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "1425CB3B4C2F49C8101ED9B8F1D289053B4DFA77" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F8E59B70E469F3A924A268911FF8FD0C37E7460" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "54BA6B78A9220344E77463947215E42F0EABCC62" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6797403AA5F8FAD8ADFF683478B45F528CE4FB0E" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "8CE28F280150E62296240E70ECAC64E4A57AB826" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "05733977F4E41652B86070B27A0CFC2C1EA7719D" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "E3FA235F3BA3F776A01A18ECA737C9890F445923" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "AF374BAFB7F5B139A0B431D3F06D82014AFF3251" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "58B696478044E028A5970D48A4ED50E164B43B36" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "E6798A06BEE17CF102BBA44FD512FF8B805CEAF1" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf75569]
       mongod(+0xB74BE2) [0xf74be2]
       mongod(+0xB74F3E) [0xf74f3e]
       libpthread.so.0(+0xF710) [0x369ce0f710]
       mongod(+0xF6104C) [0x136104c]
       libpthread.so.0(+0x79D1) [0x369ce079d1]
       libc.so.6(clone+0x6D) [0x369cae88fd]
      -----  END BACKTRACE  -----
      
      

        Issue Links

          Activity

          Hide
          michael.cahill Michael Cahill added a comment -

          I've been running workloads to keep eviction busy while constantly creating and discarding handles for hours today without triggering this.

          Given the timing, my assumption was that it was introduced by WT-2038, but AFAICT, we are holding the handle list mutex at the point where this crash happened, and we are also holding that mutex at the point where we remove a handle from the list and free its name.

          The only other thought I've had is that we might somehow be seeing a new handle before the name field has been set (which would make more sense in the sweep server than here), so I've opened https://github.com/wiredtiger/wiredtiger/pull/2124 to add memory barriers before handles are put in shared lists.

          Keith Bostic, any other ideas?

          Show
          michael.cahill Michael Cahill added a comment - I've been running workloads to keep eviction busy while constantly creating and discarding handles for hours today without triggering this. Given the timing, my assumption was that it was introduced by WT-2038 , but AFAICT, we are holding the handle list mutex at the point where this crash happened, and we are also holding that mutex at the point where we remove a handle from the list and free its name. The only other thought I've had is that we might somehow be seeing a new handle before the name field has been set (which would make more sense in the sweep server than here), so I've opened https://github.com/wiredtiger/wiredtiger/pull/2124 to add memory barriers before handles are put in shared lists. Keith Bostic , any other ideas?
          Hide
          keith.bostic Keith Bostic added a comment -

          Michael Cahill, I've looked and I don't see anything else in/around this path.

          Show
          keith.bostic Keith Bostic added a comment - Michael Cahill , I've looked and I don't see anything else in/around this path.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: SERVER-19989 There are readers of the handle list that don't acquire the mutex: add a write barrier before handles are inserted.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/85483b97b9517c552667e232e08396755a1e8d04

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: SERVER-19989 There are readers of the handle list that don't acquire the mutex: add a write barrier before handles are inserted. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/85483b97b9517c552667e232e08396755a1e8d04
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: Merge pull request #2126 from wiredtiger/dhandle-barrier-all

          SERVER-19989 Add a write barrier before data handles are added to shared lists
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/65abd20a5ec285483b43c16deed4b6e6561af71f

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Merge pull request #2126 from wiredtiger/dhandle-barrier-all SERVER-19989 Add a write barrier before data handles are added to shared lists Branch: develop https://github.com/wiredtiger/wiredtiger/commit/65abd20a5ec285483b43c16deed4b6e6561af71f
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: Merge pull request #2124 from wiredtiger/dhandle-barrier

          SERVER-19989 Add a write barrier before handles become public
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/b1850e8108e9c7a815f2c8dbda52b1fa60d93f82

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Merge pull request #2124 from wiredtiger/dhandle-barrier SERVER-19989 Add a write barrier before handles become public Branch: develop https://github.com/wiredtiger/wiredtiger/commit/b1850e8108e9c7a815f2c8dbda52b1fa60d93f82
          Hide
          michael.cahill Michael Cahill added a comment -

          This issue has been hit again in testing 3.0.6-rc1. That version doesn't include the write barrier change here, but we have no reason to believe that fixes it. We need a repro.

          Show
          michael.cahill Michael Cahill added a comment - This issue has been hit again in testing 3.0.6-rc1. That version doesn't include the write barrier change here, but we have no reason to believe that fixes it. We need a repro.
          Hide
          michael.cahill Michael Cahill added a comment -

          Update: this could have been caused by not backporting https://github.com/wiredtiger/wiredtiger/commit/393344d5d4b436fe3519cb8ab541bab22663553d to 3.0. I have corrected that now by backporting it to the WT mongodb-3.0 branch ready for a new drop into MongoDB 3.0.

          I will try to trigger that condition in RC1 to verify that this change fixes it.

          Show
          michael.cahill Michael Cahill added a comment - Update: this could have been caused by not backporting https://github.com/wiredtiger/wiredtiger/commit/393344d5d4b436fe3519cb8ab541bab22663553d to 3.0. I have corrected that now by backporting it to the WT mongodb-3.0 branch ready for a new drop into MongoDB 3.0. I will try to trigger that condition in RC1 to verify that this change fixes it.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: