Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16179

lock manager timeout w/ stack trace

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.8.0-rc0
    • Fix Version/s: 2.8.0-rc4
    • Component/s: Storage
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Long-running, blocking write I/O. Issue is intermittent (estimate 1 out of 15 trials?) I'm doubtful the following instructions will reproduce consistently.

      A) single-node running YCSB load phase. client and server on same Linux box using either SSD or HD. Interrupt in the middle with sig 2 to mongod.

      or

      B) construct an elaborate 4 member replica set across different operating systems and different mongod versions. 2 linux, 1 solaris, 1 mac. Shard YCSB usertable but use a single shard. Take a 2.7.8, 2.7.1-pre and 2.7.0-pre replica set and add another 2.7.8 member. Add a floating point field to all YCSB usertable records (you should have 5 million of them). Index that field with one of the 2.7.8 member stopped. Upgrade that member to 2.8.0-rc0 and start mongod. mongod will begin indexing the new field and in my case encountered the 30 second lock timeout. Index build was about half complete at the time of the stack trace.

      Show
      Long-running, blocking write I/O. Issue is intermittent (estimate 1 out of 15 trials?) I'm doubtful the following instructions will reproduce consistently. A) single-node running YCSB load phase. client and server on same Linux box using either SSD or HD. Interrupt in the middle with sig 2 to mongod. or B) construct an elaborate 4 member replica set across different operating systems and different mongod versions. 2 linux, 1 solaris, 1 mac. Shard YCSB usertable but use a single shard. Take a 2.7.8, 2.7.1-pre and 2.7.0-pre replica set and add another 2.7.8 member. Add a floating point field to all YCSB usertable records (you should have 5 million of them). Index that field with one of the 2.7.8 member stopped. Upgrade that member to 2.8.0-rc0 and start mongod. mongod will begin indexing the new field and in my case encountered the 30 second lock timeout. Index build was about half complete at the time of the stack trace.

      Description

      Second time I've seen the timeout in normal activities. Captured the stack trace this time. Customers and users will no doubt be confused by the appearance of the stack trace. Assuming the diagnostics are wanted by the developers, one is provided in this ticket.

      The first time the 30 sec timeout was seen was during a heavy YCSB load stage. Hit ctrl-c to shut down the foreground mongod process and encountered the 30 second lock manager timeout while in-memory data was being flushed to disk.

      The second time was on mongod startup, as a newly-ugraded (to 2.8.0-rc0) member of a replica set. Upon startup you can see an index build on a 5 million document collection (sharded with a single shard). Index build came in on the oplog. Stack trace occurs around the 50% mark.

      Partial mongod log follows:

      2014-11-16T18:06:56.984+0000 I JOURNAL  [initandlisten] recover /data/shard1/journal/j._0
      2014-11-16T18:06:57.517+0000 I JOURNAL  [initandlisten] recover cleaning up
      2014-11-16T18:06:57.517+0000 I JOURNAL  [initandlisten] removeJournalFiles
      2014-11-16T18:06:57.616+0000 I JOURNAL  [initandlisten] recover done
      2014-11-16T18:06:57.617+0000 I JOURNAL  [initandlisten] preallocating a journal file /data/shard1/journal/prealloc.0
      2014-11-16T18:07:00.006+0000 I -        [initandlisten]   File Preallocator Progress: 807403520/1073741824 75%
      2014-11-16T18:07:07.835+0000 I INDEXING [initandlisten] found 1 index(es) that wasn't finished before shutdown
      2014-11-16T18:07:08.418+0000 I INDEXING [initandlisten] found 1 interrupted index build(s) on ycsb.usertable
      2014-11-16T18:07:08.418+0000 I INDEXING [initandlisten] note: restart the server with --noIndexBuildRetry to skip index rebuilds
      2014-11-16T18:07:08.419+0000 I INDEXING [initandlisten] build index on: ycsb.usertable properties: { v: 1, key: { n: 1.0 }, name: "n_1", ns: "ycsb.usertable" }
      2014-11-16T18:07:08.419+0000 I INDEXING [initandlisten]          building index using bulk method
      2014-11-16T18:07:11.004+0000 I -        [initandlisten]   Index Build: 172700/5001000 3%
      2014-11-16T18:07:14.004+0000 I -        [initandlisten]   Index Build: 461500/5001000 9%
      2014-11-16T18:07:17.004+0000 I -        [initandlisten]   Index Build: 774100/5001000 15%
      2014-11-16T18:07:20.004+0000 I -        [initandlisten]   Index Build: 987400/5001000 19%
      2014-11-16T18:07:23.004+0000 I -        [initandlisten]   Index Build: 1243600/5001000 24%
      2014-11-16T18:07:26.005+0000 I -        [initandlisten]   Index Build: 1526900/5001000 30%
      2014-11-16T18:07:29.004+0000 I -        [initandlisten]   Index Build: 1762000/5001000 35%
      2014-11-16T18:07:32.025+0000 I -        [initandlisten]   Index Build: 2016400/5001000 40%
      2014-11-16T18:07:35.014+0000 I -        [initandlisten]   Index Build: 2343900/5001000 46%
      2014-11-16T18:07:38.014+0000 I -        [initandlisten]   Index Build: 2625600/5001000 52%
      2014-11-16T18:07:42.411+0000 I -        [clientcursormon] LockerId 56 has been waiting to acquire lock for more than 30 seconds. MongoDB will print the lock manager state and the stack of the thread that has been waiting, for diagnostic purposes. This message does not necessary imply that the server is experiencing an outage, but might be an indication of an overloaded server.
      2014-11-16T18:07:42.412+0000 I -        [clientcursormon] Dumping LockManager @ 0x1b562e0
      2014-11-16T18:07:42.412+0000 I -        [clientcursormon] Lock @ 0x2AE27E0: {9: Global, 1}
      GRANTED:
              LockRequest 20 @ 0x23C6600: Mode = IX; ConvertMode = NONE; 
              LockRequest 56 @ 0x23C6C00: Mode = IS; ConvertMode = NONE; 
       
      PENDING:
      2014-11-16T18:07:42.412+0000 I -        [clientcursormon] Lock @ 0x2AE28A0: {18: MMAPV1Flush, 2}
      GRANTED:
              LockRequest 20 @ 0x23C6600: Mode = IX; ConvertMode = NONE; 
       
      PENDING:
              LockRequest 310 @ 0x23C7200: Mode = S; ConvertMode = NONE; 
      2014-11-16T18:07:42.412+0000 I -        [clientcursormon] Lock @ 0x16E9AD00: {8527796682173025563: Database, 1065974585271628195}
      GRANTED:
              LockRequest 20 @ 0x23C6600: Mode = X; ConvertMode = NONE; 
       
      PENDING:
              LockRequest 56 @ 0x23C6C00: Mode = IS; ConvertMode = NONE; 
      2014-11-16T18:07:42.413+0000 I -        [clientcursormon] 
      2014-11-16T18:07:42.502+0000 I -        [clientcursormon] 
       0x1022c36 0x9f998c 0xa00dbb 0x9f1687 0x9f319d 0x94f55e 0x94f952 0x934894 0x955468 0xfaf38d 0x107126b 0x7f1f027e6e9a 0x7f1f018f53fd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"C22C36"},{"b":"400000","o":"5F998C"},{"b":"400000","o":"600DBB"},{"b":"400000","o":"5F1687"},{"b":"400000","o":"5F319D"},{"b":"400000","o":"54F55E"},{"b":"400000","o":"54F952"},{"b":"400000","o":"534894"},{"b":"400000","o":"555468"},{"b":"400000","o":"BAF38D"},{"b":"400000","o":"C7126B"},{"b":"7F1F027DF000","o":"7E9A"},{"b":"7F1F01801000","o":"F43FD"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc0", "gitVersion" : "b6c4e2491c1442b05a160acda0d78001f56a2ade", "uname" : { "sysname" : "Linux", "release" : "3.6.11-gentoo", "version" : "#3 SMP Wed Dec 18 20:52:24 CST 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "C6D8BFF1AC987C07A66AD78250664B9EDA577DA2" }, { "b" : "7FFF9ACEA000", "elfType" : 3 }, { "b" : "7F1F027DF000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "C340AF9DEE97C17C730F7D03693286C5194A46B8" }, { "b" : "7F1F025D7000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "352C5B373A50E6C4AB881A5DB6F5766FDF81EEE0" }, { "b" : "7F1F023D3000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D181AF551DBBC43E9D55913D532635FDE18E7C4E" }, { "b" : "7F1F020D3000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "B534DA725D06A04267EB2FEB92B9CC14C838B57B" }, { "b" : "7F1F01DD7000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "817AA99B3DD02501F8BC04A3E9A9358A08F20D7D" }, { "b" : "7F1F01BC1000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "ECF322A96E26633C5D10F18215170DD4395AF82C" }, { "b" : "7F1F01801000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "53514F7C9AD6614703AC01FD7CBCC9D3C6F621CD" }, { "b" : "7F1F029FC000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "83B40B93BD2FD266F12E008E61309FC03EB406A5" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x26) [0x1022c36]
       mongod(+0x5F998C) [0x9f998c]
       mongod(_ZN5mongo10LockerImplILb1EE4lockERKNS_10ResourceIdENS_8LockModeEjb+0x1BB) [0xa00dbb]
       mongod(_ZN5mongo4Lock6DBLock6lockDBEv+0x57) [0x9f1687]
       mongod(_ZN5mongo4Lock6DBLockC1EPNS_6LockerERKNS_10StringDataENS_8LockModeE+0x10D) [0x9f319d]
       mongod(_ZN5mongo9AutoGetDbC1EPNS_16OperationContextERKNS_10StringDataENS_8LockModeE+0x3E) [0x94f55e]
       mongod(_ZN5mongo24AutoGetCollectionForReadC2EPNS_16OperationContextERKSs+0xB2) [0x94f952]
       mongod(_ZN5mongo19GlobalCursorIdCache14timeoutCursorsEPNS_16OperationContextEi+0x1B4) [0x934894]
       mongod(_ZN5mongo19ClientCursorMonitor3runEv+0xA8) [0x955468]
       mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x11D) [0xfaf38d]
       mongod(+0xC7126B) [0x107126b]
       libpthread.so.0(+0x7E9A) [0x7f1f027e6e9a]
       libc.so.6(clone+0x6D) [0x7f1f018f53fd]
      -----  END BACKTRACE  -----
      2014-11-16T18:07:48.769+0000 I -        [initandlisten]   Index Build: 2688700/5001000 53%
      2014-11-16T18:07:51.016+0000 I -        [initandlisten]   Index Build: 2759100/5001000 55%
      2014-11-16T18:07:54.014+0000 I -        [initandlisten]   Index Build: 2924600/5001000 58%
      2014-11-16T18:07:57.015+0000 I -        [initandlisten]   Index Build: 3088800/5001000 61%

        Attachments

          Activity

            People

            Assignee:
            kaloian.manassiev Kaloian Manassiev
            Reporter:
            quentin.conner Quentin Conner
            Participants:
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: