[SERVER-40] Graceful handling of "Disk Full" Created: 09/May/09  Updated: 12/Jul/16  Resolved: 12/May/09

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: None
Fix Version/s: 0.9.3

Type: Improvement Priority: Major - P3
Reporter: Jim Jones Assignee: Aaron Staple
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

When I run out of diskspace I currently see this:

Sat May 9 13:12:52 allocating new datafile /ram/mongo_1/test.5, filling with zeroes...
Sat May 9 13:12:55 Assertion: write failed
Sat May 9 13:12:55 Got signal: 11 (Segmentation fault).
Sat May 9 13:12:55 Last op: { opid: 227, active: 0, op: "update", ns: "test.data", query: "

{ _id: 207 }

", inLock: 0 }
Sat May 9 13:12:55 Backtrace:
0x52545f 0x2b5d4f13294d 0x2b5d4f12ffab 0x2b5d4f1334e0 0x2b5d4f12febe 0x2b5d4e4b9d00 0x2b5d4fc8e400 0x2b5d4e65312a 0x4e6662 0x418c06 0x529f9c 0x2b5d4e7cdcac 0x2b5d4e4b30fa 0x2b5d4fcd966e
mongod(_ZN5mongo10abruptQuitEi+0x7af) [0x52545f]
/opt/m1/sys/jdk/jre/lib/amd64/server/libjvm.so [0x2b5d4f13294d]
/opt/m1/sys/jdk/jre/lib/amd64/server/libjvm.so [0x2b5d4f12ffab]
/opt/m1/sys/jdk/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x170) [0x2b5d4f1334e0]
/opt/m1/sys/jdk/jre/lib/amd64/server/libjvm.so [0x2b5d4f12febe]
/lib/libpthread.so.0 [0x2b5d4e4b9d00]
/lib/libc.so.6(strlen+0x10) [0x2b5d4fc8e400]
/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x5a) [0x2b5d4e65312a]
mongod [0x4e6662]
mongod [0x418c06]
mongod [0x529f9c]
/usr/lib/libboost_thread-mt.so(thread_proxy+0x6c) [0x2b5d4e7cdcac]
/lib/libpthread.so.0 [0x2b5d4e4b30fa]
/lib/libc.so.6(__clone+0x6e) [0x2b5d4fcd966e]
Sat May 9 13:12:55 dbexit:
Sat May 9 13:12:55 MessagingPort recv() error 9 127.0.0.1:52142
Sat May 9 13:12:55 end connection 127.0.0.1:52142
Sat May 9 13:12:55 connection accepted from 127.0.0.1:53934
Sat May 9 13:12:55 Listener on port 5001 aborted

Further observations:

  • The mongod process keeps running
  • All active connections appear to be deadlocked (not timing out)
  • SIGTERM triggers this msg: "got kill or ctrl c signal 2 (Interrupt), will terminate after current cmd ends"

This doesn't feel right.
"Disk full" is a corner case but a critical one. It can and does happen in production.

Questions:

  • What happens to operations that were in flight when the "disk full" happened?
  • What is our on-disk state, are we corrupted now?

Wishes:

  • Disk-Full should be handled gracefully. Db corruption must be avoided. mongod should shutdown cleanly.


 Comments   
Comment by Aaron Staple [ 12/May/09 ]

Ok, now when we run out of disk space we just exit cleanly. Tested in diskfull.js.

Comment by Aaron Staple [ 11/May/09 ]

I've worked on fixing this a couple of times already. I'd like to write a test for running out of disk space, but I think it would be difficult to make that portable. Can I just do a single platform test ( osx or linux – I'll choose whichever os makes this easiest unless there's a preference).

Generated at Thu Feb 08 02:52:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.