Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40

Graceful handling of "Disk Full"

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 0.9.3
    • Affects Version/s: None
    • Component/s: Stability
    • Labels:
      None

      When I run out of diskspace I currently see this:

      Sat May 9 13:12:52 allocating new datafile /ram/mongo_1/test.5, filling with zeroes...
      Sat May 9 13:12:55 Assertion: write failed
      Sat May 9 13:12:55 Got signal: 11 (Segmentation fault).
      Sat May 9 13:12:55 Last op: { opid: 227, active: 0, op: "update", ns: "test.data", query: "

      { _id: 207 }

      ", inLock: 0 }
      Sat May 9 13:12:55 Backtrace:
      0x52545f 0x2b5d4f13294d 0x2b5d4f12ffab 0x2b5d4f1334e0 0x2b5d4f12febe 0x2b5d4e4b9d00 0x2b5d4fc8e400 0x2b5d4e65312a 0x4e6662 0x418c06 0x529f9c 0x2b5d4e7cdcac 0x2b5d4e4b30fa 0x2b5d4fcd966e
      mongod(_ZN5mongo10abruptQuitEi+0x7af) [0x52545f]
      /opt/m1/sys/jdk/jre/lib/amd64/server/libjvm.so [0x2b5d4f13294d]
      /opt/m1/sys/jdk/jre/lib/amd64/server/libjvm.so [0x2b5d4f12ffab]
      /opt/m1/sys/jdk/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x170) [0x2b5d4f1334e0]
      /opt/m1/sys/jdk/jre/lib/amd64/server/libjvm.so [0x2b5d4f12febe]
      /lib/libpthread.so.0 [0x2b5d4e4b9d00]
      /lib/libc.so.6(strlen+0x10) [0x2b5d4fc8e400]
      /usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x5a) [0x2b5d4e65312a]
      mongod [0x4e6662]
      mongod [0x418c06]
      mongod [0x529f9c]
      /usr/lib/libboost_thread-mt.so(thread_proxy+0x6c) [0x2b5d4e7cdcac]
      /lib/libpthread.so.0 [0x2b5d4e4b30fa]
      /lib/libc.so.6(__clone+0x6e) [0x2b5d4fcd966e]
      Sat May 9 13:12:55 dbexit:
      Sat May 9 13:12:55 MessagingPort recv() error 9 127.0.0.1:52142
      Sat May 9 13:12:55 end connection 127.0.0.1:52142
      Sat May 9 13:12:55 connection accepted from 127.0.0.1:53934
      Sat May 9 13:12:55 Listener on port 5001 aborted

      Further observations:

      • The mongod process keeps running
      • All active connections appear to be deadlocked (not timing out)
      • SIGTERM triggers this msg: "got kill or ctrl c signal 2 (Interrupt), will terminate after current cmd ends"

      This doesn't feel right.
      "Disk full" is a corner case but a critical one. It can and does happen in production.

      Questions:

      • What happens to operations that were in flight when the "disk full" happened?
      • What is our on-disk state, are we corrupted now?

      Wishes:

      • Disk-Full should be handled gracefully. Db corruption must be avoided. mongod should shutdown cleanly.

            Assignee:
            aaron Aaron Staple
            Reporter:
            raz Jim Jones
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: