[SERVER-15120] disk/diskfull.js: Mongod fails to start up when disk is full due to failure creating local.me collection Created: 29/Aug/14  Updated: 08/Sep/14  Resolved: 03/Sep/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.7.6
Fix Version/s: 2.7.6

Type: Bug Priority: Minor - P4
Reporter: Benety Goh Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-13961 Move LockState under OperationContext Closed
is related to SERVER-15031 Switch over fully to the new Replicat... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

https://mci.10gen.com/ui/task/mongodb_mongo_master_rhel55_eca928dd6fdbca3b5a4e773af8f48333fee39409_14_08_29_16_00_07_disk_rhel55

eca928dd6f SSL RHEL 5.5 64-bit disk

MongoDB shell version: 2.7.6-pre-
connecting to: 127.0.0.1:27999/test
/data/db/
2014-08-29T13:28:57.664-0400 I -        shell: started program /srv/10gen/mci-exec/mci/shell/src/mongod --port 31000 --dbpath /data/db/diskfulltest --nohttpinterface --bind_ip 127.0.0.1 --nojournal --setParameter enableTestCommands=1
2014-08-29T13:28:57.665-0400 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:111 Connection refused
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] MongoDB starting : pid=17877 port=31000 dbpath=/data/db/diskfulltest 64-bit host=build9.nj1.10gen.cc
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] 
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] ** NOTE: This is a development version (2.7.6-pre-) of MongoDB.
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] **       Not recommended for production.
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] 
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] ** WARNING: You are running on a NUMA machine.
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] **              numactl --interleave=all mongod [other options]
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] 
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] **        We suggest setting it to 'never'
 m31000| 2014-08-29T13:28:57.681-0400 I -        [initandlisten] 
 m31000| 2014-08-29T13:28:57.682-0400 I -        [initandlisten] db version v2.7.6-pre-
 m31000| 2014-08-29T13:28:57.682-0400 I -        [initandlisten] git version: eca928dd6fdbca3b5a4e773af8f48333fee39409
 m31000| 2014-08-29T13:28:57.682-0400 I -        [initandlisten] OpenSSL version: OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
 m31000| 2014-08-29T13:28:57.682-0400 I -        [initandlisten] build info: Linux build5.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_56
 m31000| 2014-08-29T13:28:57.682-0400 I -        [initandlisten] allocator: tcmalloc
 m31000| 2014-08-29T13:28:57.682-0400 I -        [initandlisten] options: { net: { bindIp: "127.0.0.1", http: { enabled: false }, port: 31000 }, setParameter: { enableTestCommands: "1" }, storage: { dbPath: "/data/db/diskfulltest", journal: { enabled: false } } }
 m31000| 2014-08-29T13:28:57.683-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/diskfulltest/local.ns, filling with zeroes...
 m31000| 2014-08-29T13:28:57.683-0400 I STORAGE  [FileAllocator] creating directory /data/db/diskfulltest/_tmp
 m31000| 2014-08-29T13:28:57.725-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/diskfulltest/local.ns, size: 16MB,  took 0.041 secs
 m31000| 2014-08-29T13:28:57.726-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/diskfulltest/local.0, filling with zeroes...
 m31000| 2014-08-29T13:28:57.776-0400 I STORAGE  [FileAllocator] FileAllocator: posix_fallocate failed: errno:28 No space left on device falling back
 m31000| 2014-08-29T13:28:57.776-0400 I STORAGE  [FileAllocator] error: failed to allocate new file: /data/db/diskfulltest/local.0 size: 67108864 failure creating new datafile; lseek failed for fd 13 with errno: errno:28 No space left on device.  will try again in 10 seconds
2014-08-29 13:29:07 EDT	
 m31000| 2014-08-29T13:29:07.784-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/diskfulltest/local.0, filling with zeroes...
 m31000| 2014-08-29T13:29:07.784-0400 I -        [initandlisten] Assertion: 12520:new file allocation failure
 m31000| 2014-08-29T13:29:07.784-0400 I QUERY    [initandlisten] command local.$cmd command: create { create: "startup_log", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 numYields:0  reslen:98 10101ms
 m31000| 2014-08-29T13:29:07.785-0400 I STORAGE  [initandlisten] exception in initAndListen: 17507 Can't take a write lock while out of disk space, terminating
2014-08-29T13:29:07.785-0400 I NETWORK  Socket recv() errno:104 Connection reset by peer 127.0.0.1:31000
 m31000| 2014-08-29T13:29:07.785-0400 I COMMANDS [initandlisten] now exiting
 m31000| 2014-08-29T13:29:07.785-0400 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
 m31000| 2014-08-29T13:29:07.785-0400 I NETWORK  [initandlisten] closing listening socket: 9
 m31000| 2014-08-29T13:29:07.785-0400 I NETWORK  [initandlisten] closing listening socket: 11
 m31000| 2014-08-29T13:29:07.785-0400 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-31000.sock
 m31000| 2014-08-29T13:29:07.785-0400 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2014-08-29T13:29:07.785-0400 I NETWORK  SocketException: remote: 127.0.0.1:31000 error: 9001 socket exception [RECV_ERROR] server [127.0.0.1:31000] 
2014-08-29T13:29:07.785-0400 I NETWORK  DBClientCursor::init call() failed
 m31000| 2014-08-29T13:29:07.785-0400 I NETWORK  [initandlisten] shutdown: going to close sockets...
 m31000| 2014-08-29T13:29:07.785-0400 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
 m31000| 2014-08-29T13:29:07.785-0400 I STORAGE  [initandlisten] shutdown: closing all files...
 m31000| 2014-08-29T13:29:07.785-0400 I STORAGE  [initandlisten] closeAllFiles() finished
 m31000| 2014-08-29T13:29:07.785-0400 I STORAGE  [initandlisten] shutdown: removing fs lock...
 m31000| 2014-08-29T13:29:07.785-0400 I COMMANDS [initandlisten] dbexit:  rc: 100
2014-08-29T13:29:07.790-0400 I QUERY    Error: error doing query: failed
    at DBQuery._exec (src/mongo/shell/query.js:83:36)
    at DBQuery.hasNext (src/mongo/shell/query.js:114:10)
    at DBCollection.findOne (src/mongo/shell/collection.js:185:19)
    at DB.runCommand (src/mongo/shell/db.js:58:41)
    at Mongo.hasWriteCommands (src/mongo/shell/mongo.js:171:44)
    at Mongo.writeMode (src/mongo/shell/mongo.js:199:15)
    at DBCollection.insert (src/mongo/shell/collection.js:230:26)
    at /srv/10gen/mci-exec/mci/shell/src/jstests/disk/diskfull.js:25:25 at src/mongo/shell/query.js:83
failed to load: /srv/10gen/mci-exec/mci/shell/src/jstests/disk/diskfull.js



 Comments   
Comment by Githook User [ 03/Sep/14 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-15120 Early return from startReplication when no replication is enabled
Branch: master
https://github.com/mongodb/mongo/commit/e53157779c08fc5f851bd17440d65cf596ba1f14

Comment by Eric Milkie [ 02/Sep/14 ]

I was mistaken in that comment; it was local.slaves that is going away. Sorry for the confusion.

Comment by Benety Goh [ 30/Aug/14 ]

Correction: startup_log was created without any issues - startReplication() is attempting to access local.me after the failed file allocation.

spencer, can you determine what's the correct behavior here? From milkie's comment in MMSP-1549, I'm guessing local.me will be going away sometime in 2014.

Comment by Benety Goh [ 29/Aug/14 ]

5486963a35 SSL RHEL 5.5 64-bit disk

https://mci.10gen.com/ui/task/mongodb_mongo_master_rhel55_5486963a3549b69190a2e3adc584db312111abaf_14_08_28_21_02_06_disk_rhel55

http://buildlogs.mongodb.org/mci_0.9_rhel55/builds/23168/test/disk_0/diskfull.js

 m31000| 2014-08-28T19:40:04.383-0400 I -        [initandlisten] Assertion: 12520:new file allocation failure
 m31000| 2014-08-28T19:40:04.384-0400 I QUERY    [initandlisten] command local.$cmd command: create { create: "startup_log", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 numYields:0  reslen:98 10103ms
 m31000| 2014-08-28T19:40:04.384-0400 I NETWORK  [initandlisten] waiting for connections on port 31000
 m31000| 2014-08-28T19:40:04.384-0400 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:46623 #1 (1 connection now open)

In both failing and successful test runs, at server startup, file allocation failed before attempting to create the startup_log collection. I would expect to see the same "17507 Can't take a write lock while out of disk space, terminating" assertion in both cases.

Generated at Thu Feb 08 03:37:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.