Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22482

Cache growing to 100% followed by crash

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.1
    • Component/s: WiredTiger
    • Labels:
      None
    • ALL
    • Hide

      Start primary shard server during a busy day and wait a couple of hours

      Show
      Start primary shard server during a busy day and wait a couple of hours

      The primary server on my primary shard has encountered crashing problems repeatedly during production. Mongostat reports the used% of cache growing to 100% (and sometimes 101%) and the dirty % to over 90%. When this situation occurs it is just a matter of time until the server crashes. Memory size and res do not grow to the point where the server crashes because it is out of memory.

      Opening the log file for either server (this happened to both the promoted secondary as well as the primary) I find thousands of lines with these error messages:

      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5ECA4877FFFFEB27DF76BB90446AE2462) is less than the previous key (2E4B485BC8877FFFFEAD53C76DEB044B83D4DA), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 140402C0) is less than the previous key (3C0A3238324B204A4F45203A536F43616C4A4F4542000447951242), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F308EB0442B6EA52) is less than the previous key (2E4B485BC8877FFFFEAD526FF19B044B83B3EA), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F3082E2A7DC42278800001464CDE2D0678800001464CDE2D060442B6EA52) is less than the previous key (2E4B485BC82E01D860A07880000152AC3892147880000152AC389214044B83D492), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F308877FFFFEB9B321D2F90442B6EA52) is less than the previous key (2E4B485BC8877FFFFEAD53C76DEB044B83D4DA), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 140402E8) is less than the previous key (3C0A3238324B204A4F45203A536F43616C4A4F4542000447951242), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F478EB04445670AA) is less than the previous key (2E4B485BC8877FFFFEAD526FF19B044B83B3EA), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F4782D317A2E7880000149E9C7C8D27880000149E9C7C8D204445670AA) is less than the previous key (2E4B485BC82E01D860A07880000152AC3892147880000152AC389214044B83D492), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F478877FFFFEB61638372D04445670AA) is less than the previous key (2E4B485BC8877FFFFEAD53C76DEB044B83D4DA), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 14040300) is less than the previous key (3C0A3238324B204A4F45203A536F43616C4A4F4542000447951242), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F5FA877FFFFEAD520BCC4D0445D35EFA) is less than the previous key (2E4B485BC8877FFFFEAD526FF19B044B83B3EA), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F5FA2D27E306788000014BFD075515788000014BFD07551504453ACEF2) is less than the previous key (2E4B485BC82E01D860A07880000152AC3892147880000152AC389214044B83D492), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F5FA877FFFFEB06628460F044880E17A) is less than the previous key (2E4B485BC8877FFFFEAD53C76DEB044B83D4DA), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 14040308) is less than the previous key (3C0A3238324B204A4F45203A536F43616C4A4F4542000447951242), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F5FA877FFFFEAD5271F85C04475D1322) is less than the previous key (2E4B485BC8877FFFFEAD526FF19B044B83B3EA), which is a bug.
      2016-02-04T23:46:00.871+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F5FA2E013EB10478800001497BF3CA7378800001497BF3CA73044431CE52) is less than the previous key (2E4B485BC82E01D860A07880000152AC3892147880000152AC389214044B83D492), which is a bug.
      2016-02-04T23:46:00.870+0000 I STORAGE  [conn42] WTIndex::updatePosition -- the new key ( 2E15D5F5FAEB0442EEBC72) is less than the previous key (2E4B485BC8877FFFFEAD526FF19B044B83B41A), which is a bug.
      2016-02-04T23:46:00.872+0000 I STORAGE  [conn40] WTIndex::updatePosition -- the new key ( 2E15D5F5FA877FFFFEB1AD52F4D804475D1322) is less than the previous key (2E4B485BC8877FFFFEAD53C76DEB044B83D4DA), which is a bug.
      2016-02-04T23:46:00.872+0000 I STORAGE  [conn42] WTIndex::updatePosition -- the new key ( 2E15D5F5FA2EC2DA5340788000014E52AD0B27788000014E52AD0B2704475D1322) is less than the previous key (2E4B485BC82E01DE84C27880000152AA3D0BAB7880000152AA3D0BAB044B8272B2), which is a bug.
      2016-02-04T23:46:00.872+0000 I STORAGE  [conn42] WTIndex::updatePosition -- the new key ( 2E15D5F5FA877FFFFEB9156C2AD90442EEBC72) is less than the previous key (2E4B485BC8877FFFFEAD53C76DEB044B83D4EA), which is a bug.
      

      The end of the log has no crash information at all.

        1. diagnostics.zip
          87.55 MB
        2. ftdc.png
          ftdc.png
          194 kB
        3. shard2-crash.log
          17 kB

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            miketempleman Mike Templeman
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: