Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.7, 3.1.8
    • Component/s: WiredTiger
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      Insert only workload (hammer.mongo)
      --replSet EitanRs3a --dbpath f:\data\db1 --logpath h:\data\rs3primary.txt --storageEngine wiredTiger --port 5002

      Show
      Insert only workload (hammer.mongo) --replSet EitanRs3a --dbpath f:\data\db1 --logpath h:\data\rs3primary.txt --storageEngine wiredTiger --port 5002

      Description

      Issue Status as of Sep 10, 2015

      ISSUE SUMMARY
      MongoDB running with the WiredTiger storage engine, under high load with append-only workloads and no reads, may fail to find pages to evict from cache and hang.

      USER IMPACT
      mongod keeps running but becomes unresponsive.

      WORKAROUNDS
      Once the process becomes stuck, mongod must be restarted.

      AFFECTED VERSIONS
      MongoDB 3.0.0 through 3.0.6

      FIX VERSION
      The fix is included in the 3.0.7 production release.

      Configuration:

      3 members replica set
      db version v3.1.7-pre-
      git version: 4cf56d86a386039839dc10bb761bd28c829be426

      Two problems:

      1) Primary node is up and running but not able to perform any CRUD operations (mongostat and other db. . insert({}) hang), however failover didn't occur.

      2) WiredTiger execute endless loop in !__wt_tree_walk and holding CRUD operations w/o timeout/watchdog for robustness (See debugger output for the lock owner)

      0:460> !cs -l
      -----------------------------------------
      DebugInfo          = 0x000000de00a25740
      Critical section   = 0x000000de7fc780c0 (+0xDE7FC780C0)
      LOCKED
      LockCount          = 0x0
      WaiterWoken        = No
      OwningThread       = 0x000000000000097c
      RecursionCount     = 0x1
      LockSemaphore      = 0xD8C
      SpinCount          = 0x0000000000000fa0
       
        2  Id: 11d4.97c Suspend: 1 Teb: 00007ff7`4fe68000 Unfrozen
      Child-SP          RetAddr           Call Site
      000000de`01dafc30 00007ff7`51567749 mongod!__wt_tree_walk+0x1a8 [c:\data\mci\src\src\third_party\wiredtiger\src\btree\bt_walk.c @ 243]
      000000de`01dafcc0 00007ff7`515672e7 mongod!__evict_walk_file+0x329 [c:\data\mci\src\src\third_party\wiredtiger\src\evict\evict_lru.c @ 1154]
      000000de`01dafd60 00007ff7`51566764 mongod!__evict_walk+0x2b7 [c:\data\mci\src\src\third_party\wiredtiger\src\evict\evict_lru.c @ 1032]
      000000de`01dafdf0 00007ff7`51566d5b mongod!__evict_lru_walk+0x24 [c:\data\mci\src\src\third_party\wiredtiger\src\evict\evict_lru.c @ 789]
      000000de`01dafe20 00007ff7`51566f58 mongod!__evict_pass+0x25b [c:\data\mci\src\src\third_party\wiredtiger\src\evict\evict_lru.c @ 502]
      000000de`01dafe80 00007ffb`5e534f7f mongod!__evict_server+0x38 [c:\data\mci\src\src\third_party\wiredtiger\src\evict\evict_lru.c @ 169]
      000000de`01dafeb0 00007ffb`5e535126 MSVCR120!beginthreadex+0x107
      000000de`01dafee0 00007ffb`6d3f15dd MSVCR120!endthreadex+0x192
      000000de`01daff10 00007ffb`6d7343d1 KERNEL32!BaseThreadInitThunk+0xd
      000000de`01daff40 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
      

      RS.Status

      EitanRs3a:PRIMARY> rs.status()
      {
              "set" : "EitanRs3a",
              "date" : ISODate("2015-08-18T14:45:29.611Z"),
              "myState" : 1,
              "term" : NumberLong(0),
              "heartbeatIntervalMillis" : NumberLong(2000),
              "members" : [
                      {
                              "_id" : 0,
                              "name" : "eitan5:5002",
                              "health" : 1,
                              "state" : 1,
                              "stateStr" : "PRIMARY",
                              "uptime" : 66715,
                              "optime" : Timestamp(1439894846, 12455),
                              "optimeDate" : ISODate("2015-08-18T10:47:26Z"),
                              "electionTime" : Timestamp(1439842421, 2),
                              "electionDate" : ISODate("2015-08-17T20:13:41Z"),
                              "configVersion" : 3,
                              "self" : true
                      },
                      {
                              "_id" : 1,
                              "name" : "Eitan1:5002",
                              "health" : 1,
                              "state" : 1,
                              "stateStr" : "PRIMARY",
                              "uptime" : 66704,
                              "optime" : Timestamp(1439894723, 4673),
                              "optimeDate" : ISODate("2015-08-18T10:45:23Z"),
                              "lastHeartbeat" : ISODate("2015-08-18T14:45:29.030Z"),
                              "lastHeartbeatRecv" : ISODate("2015-08-18T14:45:28.841Z"),
                              "pingMs" : 2,
                              "electionTime" : Timestamp(1439906145, 1),
                              "electionDate" : ISODate("2015-08-18T13:55:45Z"),
                              "configVersion" : 3
                      },
                      {
                              "_id" : 2,
                              "name" : "Eitan6:5002",
                              "health" : 1,
                              "state" : 2,
                              "stateStr" : "SECONDARY",
                              "uptime" : 66663,
                              "optime" : Timestamp(1439893521, 7147),
                              "optimeDate" : ISODate("2015-08-18T10:25:21Z"),
                              "lastHeartbeat" : ISODate("2015-08-18T14:45:29.041Z"),
                              "lastHeartbeatRecv" : ISODate("2015-08-18T14:45:28.844Z"),
                              "pingMs" : 1,
                              "syncingTo" : "eitan5:5002",
                              "configVersion" : 3
                      }
              ],
              "ok" : 1
      }
      

      1. 317.html
        4.52 MB
        Eitan Klein
      2. deadlockduringstress.txt
        1.51 MB
        Eitan Klein
      3. rs3primary.txt
        89.39 MB
        Eitan Klein
      1. Opspersec-duringstress.png
        28 kB
      2. timeseries-20008.png
        126 kB

        Issue Links

          Activity

          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #2148 from wiredtiger/SERVER-20008

          SERVER-20008 Don't reset eviction walks when hitting a busy page.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/38dad395053b3eca1998c6c1402adc74fc4cba61

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #2148 from wiredtiger/ SERVER-20008 SERVER-20008 Don't reset eviction walks when hitting a busy page. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/38dad395053b3eca1998c6c1402adc74fc4cba61
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: SERVER-20008 Don't reset eviction walks when hitting a busy page.

          Merge pull request #2148 from wiredtiger/SERVER-20008

          (cherry picked from commit 38dad395053b3eca1998c6c1402adc74fc4cba61)
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/52feb5b11555f1b9535063388db45d540cf168ea

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: SERVER-20008 Don't reset eviction walks when hitting a busy page. Merge pull request #2148 from wiredtiger/ SERVER-20008 (cherry picked from commit 38dad395053b3eca1998c6c1402adc74fc4cba61) Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/52feb5b11555f1b9535063388db45d540cf168ea
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: SERVER-20008 Don't reset eviction walks when hitting a busy page.

          Merge pull request #2148 from wiredtiger/SERVER-20008

          (cherry picked from commit 38dad395053b3eca1998c6c1402adc74fc4cba61)
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/52feb5b11555f1b9535063388db45d540cf168ea

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: SERVER-20008 Don't reset eviction walks when hitting a busy page. Merge pull request #2148 from wiredtiger/ SERVER-20008 (cherry picked from commit 38dad395053b3eca1998c6c1402adc74fc4cba61) Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/52feb5b11555f1b9535063388db45d540cf168ea
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: SERVER-20008 Acquire the handle list lock when clearing walks to avoid a race with sweep.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/33f5597916964a6b4956bccac15644b0d61bbb36

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: SERVER-20008 Acquire the handle list lock when clearing walks to avoid a race with sweep. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/33f5597916964a6b4956bccac15644b0d61bbb36
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: SERVER-20008 Acquire the handle list lock when clearing walks to avoid a race with sweep.

          (cherry picked from commit 33f5597916964a6b4956bccac15644b0d61bbb36)
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/0d76fcd0976a55385d68ae5d5d389147f641976b

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: SERVER-20008 Acquire the handle list lock when clearing walks to avoid a race with sweep. (cherry picked from commit 33f5597916964a6b4956bccac15644b0d61bbb36) Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/0d76fcd0976a55385d68ae5d5d389147f641976b

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: