deadlock during LVM snapshot

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Cannot Reproduce
    • Priority: Major - P3
    • None
    • Affects Version/s: 3.2.9
    • Component/s: None
    • None
    • ALL
    • Hide

      Difficult, happened just once.

      Show
      Difficult, happened just once.
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      I am testing backup of a standalone mongodb using LVM snapshot. I have journaling enabled so mongod is running and processing requests.

      It happened just once that I run into a deadlock situation. There is one jbd2 and three mongod threads reported blocked in the syslog (attached with back traces) and another jdb2 and two more mongod threads after exactly two minutes.

      Sep 15 17:03:01 dss2 kernel: [719046.057796] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null)
      Sep 15 17:06:53 dss2 kernel: [719277.567904] INFO: task jbd2/dm-2-8:9048 blocked for more than 120 seconds.
      Sep 15 17:06:53 dss2 kernel: [719277.568130] INFO: task mongod:23239 blocked for more than 120 seconds.
      Sep 15 17:06:53 dss2 kernel: [719277.568267] INFO: task mongod:23242 blocked for more than 120 seconds.
      Sep 15 17:06:53 dss2 kernel: [719277.568350] INFO: task mongod:23243 blocked for more than 120 seconds.
      Sep 15 17:06:53 dss2 kernel: [719277.568397] INFO: task dmeventd:12427 blocked for more than 120 seconds.
      Sep 15 17:06:53 dss2 kernel: [719277.568523] INFO: task kworker/u16:2:31890 blocked for more than 120 seconds.
      Sep 15 17:06:53 dss2 kernel: [719277.568713] INFO: task tar:12446 blocked for more than 120 seconds.
      Sep 15 17:08:53 dss2 kernel: [719397.567614] INFO: task jbd2/dm-2-8:9048 blocked for more than 120 seconds.
      Sep 15 17:08:53 dss2 kernel: [719397.567731] INFO: task mongod:23239 blocked for more than 120 seconds.
      Sep 15 17:08:53 dss2 kernel: [719397.567870] INFO: task mongod:23240 blocked for more than 120 seconds.
      

      The last processing as reported in mongod.log was at 2016-09-15T17:08:27.808+0200, after that just the number of open connections accumulates.

      I was not abble to terminate the tar process and unmount the snapshot. Also mongod was not stoppable and whole system had to be rebooted.

      I am running Ubuntu 16.04.1 with mongod 3.2.9 on a 64bit system.

        1. 17.log
          46 kB
        2. diagnostic.data.tgz
          105.68 MB

            Assignee:
            Kelsey Schubert
            Reporter:
            Tomaz Beltram
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: