-
Type: Bug
-
Resolution: Done
-
Priority: Minor - P4
-
Affects Version/s: None
-
Component/s: None
-
None
-
Environment:Linux
-
ALL
When running under --directoryperdb, if you drop a database, then there is a
race condition between the code that deletes the directory, and any request
for more storage enqueued in the FileAllocator.
This is what I see in the log file (without my patch for SERVER-2416 applied,
hence the unexpected form of error messages on allocation failure)
Tue Jan 25 17:09:13 [conn415] end connection 127.0.0.1:33278
Tue Jan 25 17:09:13 [initandlisten] connection accepted from 127.0.0.1:33279 #416
Tue Jan 25 17:09:13 allocating new datafile /export/home/netbanx/var/mongodb/data/db/test_velocity_complex_rules/test_velocity_complex_rules.ns, filling with zeroes...
Tue Jan 25 17:09:13 done allocating datafile /export/home/netbanx/var/mongodb/data/db/test_velocity_complex_rules/test_velocity_complex_rules.ns, size: 16MB, took 0.027 secs
Tue Jan 25 17:09:13 allocating new datafile /export/home/netbanx/var/mongodb/data/db/test_velocity_complex_rules/test_velocity_complex_rules.0, filling with zeroes...
Tue Jan 25 17:09:13 done allocating datafile /export/home/netbanx/var/mongodb/data/db/test_velocity_complex_rules/test_velocity_complex_rules.0, size: 64MB, took 0.114 secs
Tue Jan 25 17:09:13 [conn416] building new index on
for test_velocity_complex_rules.transactions
Tue Jan 25 17:09:13 [conn416] done for 0 records 0secs
Tue Jan 25 17:09:13 [conn416] insert test_velocity_complex_rules.transactions 152ms
Tue Jan 25 17:09:13 [conn416] dropDatabase test_velocity_complex_rules
Tue Jan 25 17:09:13 [conn416] end connection 127.0.0.1:33279
Tue Jan 25 17:09:13 allocating new datafile /export/home/netbanx/var/mongodb/data/db/test_velocity_complex_rules/test_velocity_complex_rules.1, filling with zeroes...
Tue Jan 25 17:09:13 warning: posix_fadvise fails /export/home/netbanx/var/mongodb/data/db/test_velocity_complex_rules/test_velocity_complex_rules.1 errno:2 No such file or directory
Tue Jan 25 17:09:13 posix_fallocate failed: errno:9 Bad file descriptor falling back
Tue Jan 25 17:09:14 Assertion: 10440:failure creating new datafile; lseek failed for fd -1 with errno: errno:9 Bad file descriptor
0x500278 0x7e49ff 0x7e523f 0x2b9758095c3f 0x2b97580956bf 0x3b4a80673d 0x3b4a0d3d1d
/export/home/netbanx/base/usr/mongodb/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x1c8) [0x500278]
/export/home/netbanx/base/usr/mongodb/bin/mongod(_ZN5mongo13FileAllocator12ensureLengthEil+0x34f) [0x7e49ff]
/export/home/netbanx/base/usr/mongodb/bin/mongod(_ZN5mongo13FileAllocator6RunnerclEv+0x3bf) [0x7e523f]
/usr/lib64/libboost_thread.so.2(_ZNK5boost9function0IvSaINS_13function_baseEEEclEv+0xbf) [0x2b9758095c3f]
/usr/lib64/libboost_thread.so.2 [0x2b97580956bf]
/lib64/libpthread.so.0 [0x3b4a80673d]
/lib64/libc.so.6(clone+0x6d) [0x3b4a0d3d1d]
Tue Jan 25 17:09:14 Failed to allocate new file: /export/home/netbanx/var/mongodb/data/db/test_velocity_complex_rules/test_velocity_complex_rules.1, size: 134217728, aborting.
The race is between this code in FileOp:
void _deleteDataFiles(const char *database) {
if ( directoryperdb )
and the loop in FileAllocator::run, and goes badly wrong if the sequence of
events is
1: mongod decides to pre-allocate more storage for this database, and enqueues
a new file name on the FileAllocator's pending queue
2: the client instructs mongod to drop the database, and the server calls
_deleteDataFiles() above
3: the queue runner thread reaches the enqueued name, and attempts to create
the file
At that point the open fails, and the allocator sets the failed status to true,
and no more allocation is possible.
I see that in master there is now this code:
ensureParentDirCreated(name);
long fd = open(name.c_str(), O_CREAT | O_RDWR | O_NOATIME, S_IRUSR | S_IWUSR);
That's still a race, albeit a much smaller race. It's still possible that the
boost::filesystem::remove_all hits between the two lines above.
I believe that the only correct race-free solution is for the delete code to
purge the pending queue of any allocations for this directory about to be
deleted. I see that this would require a new method added to FileAllocator to
do this, as the pending queue is private to it.
Nicholas Clark