Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34027

Production issue under high load.

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.1
    • Component/s: Stability
    • ALL
    • Dev Tools 2019-05-06, Dev Tools 2019-04-22

      We had an issue in production yesturday:

      • at ~16:31 UTC undetected queries starts to intensively read cold data. Read form disks greatly increased, as well as eviction from mongo's page cache. This unexpected load lasts till ~17:12.
        The load is not issue by itself remarkable for report here. But it leads to following bad behavour:
      • at 16:42:44 mongod literally freezes. It doesn't respond to any thing, doesn't write anything to log, doesn't send statistic till 16:43:06.
        I believe it is bug.
        I'm attaching diagnostic data file that covers that period, and screenshot of monitoring for long period (16:20-17:30) and focused on issue (short 16:40-16:45)

      Environment: aws i3.x16large, mondo data path is placed on lvm volume over two NVMe devices.

        1. decommit.png
          146 kB
          Bruce Lucas
        2. diagnostic.tar.gz
          10.05 MB
          funny_falcon
        3. henrik-include-centralcache.png
          168 kB
          Henrik Ingo
        4. MongoStat-2018-03-21-long.png
          457 kB
          funny_falcon
        5. MongoStat-2018-03-21-short.png
          391 kB
          funny_falcon
        6. SystemStat-2018-03-21-long.png
          492 kB
          funny_falcon
        7. SystemStat-2018-03-21-short.png
          325 kB
          funny_falcon

            Assignee:
            david.daly@mongodb.com David Daly
            Reporter:
            funny.falcon@gmail.com Юрий Соколов
            Votes:
            1 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated:
              Resolved: