[SERVER-22074] mongod does not exit with --nojournal or --smallfiles Created: 06/Jan/16  Updated: 06/Apr/23  Resolved: 12/Jan/16

Status: Closed
Project: Core Server
Component/s: MMAPv1
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Claudio Bley Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

1. docker run mongo:3 --storageEngine mmapv1 --nojournal --smallfiles
2. hit Ctrl-c

Participants:

 Description   

We're using mongo inside Docker for our integration tests.

Uncertain what the root cause is, on some machines (all running Ubuntu 15.10 64bit), the mongod process does not exit when sent a kill signal consuming 100% CPU when starting with either the smallfiles or nojournal options:

$ docker run mongo:2.6.11 -vvv --nojournal
 
016-01-06T13:46:31.976+0000 shardKeyTest passed
2016-01-06T13:46:31.976+0000 isInRangeTest passed
2016-01-06T13:46:31.976+0000 shardObjTest passed
2016-01-06T13:46:31.977+0000 [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=447b926c1a51
2016-01-06T13:46:31.977+0000 [initandlisten] db version v2.6.11
2016-01-06T13:46:31.977+0000 [initandlisten] git version: d00c1735675c457f75a12d530bee85421f0c5548
2016-01-06T13:46:31.977+0000 [initandlisten] build info: Linux build4.ny.cbi.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2016-01-06T13:46:31.977+0000 [initandlisten] allocator: tcmalloc
2016-01-06T13:46:31.977+0000 [initandlisten] options: { storage: { smallFiles: true }, systemLog: { verbosity: 3 } }
2016-01-06T13:46:31.977+0000 [DataFileSync] BackgroundJob starting: DataFileSync
2016-01-06T13:46:31.980+0000 [initandlisten] flushing directory /data/db
2016-01-06T13:46:31.981+0000 [initandlisten] journal dir=/data/db/journal
2016-01-06T13:46:31.981+0000 [initandlisten] recover : no journal files present, no recovery needed
2016-01-06T13:46:31.981+0000 [initandlisten] flushing directory /data/db/journal
2016-01-06T13:46:32.113+0000 [initandlisten] flushing directory /data/db/journal
2016-01-06T13:46:32.171+0000 [initandlisten] flushing directory /data/db/journal
2016-01-06T13:46:32.180+0000 [initandlisten] opening db:  local
2016-01-06T13:46:32.181+0000 [initandlisten] enter repairDatabases (to check pdfile version #)
2016-01-06T13:46:32.181+0000 [initandlisten] done repairDatabases
2016-01-06T13:46:32.181+0000 [initandlisten] opening db:  admin
2016-01-06T13:46:32.181+0000 [initandlisten] Running query: query: {} sort: {} projection: {} skip: 0 limit: 0
2016-01-06T13:46:32.181+0000 [initandlisten] query admin.system.roles planSummary: EOF ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 numYields:0 locks(micros) W:72 r:69 nreturned:0 reslen:20 0ms
2016-01-06T13:46:32.181+0000 [initandlisten] Collection admin.system.version does not exist. Using EOF runner: query: { _id: "authSchema" } sort: {} projection: {} skip: 0 limit: 0
2016-01-06T13:46:32.181+0000 [initandlisten] Collection admin.system.users does not exist. Using EOF runner: query: {} sort: {} projection: {} skip: 0 limit: 0
2016-01-06T13:46:32.181+0000 [initandlisten] fd limit hard:1048576 soft:1048576 max conn: 838860
2016-01-06T13:46:32.181+0000 [initandlisten] run command local.$cmd { create: "startup_log", size: 10485760, capped: true }
2016-01-06T13:46:32.182+0000 [initandlisten] create collection local.startup_log { size: 10485760, capped: true }
2016-01-06T13:46:32.182+0000 [initandlisten] allocating new ns file /data/db/local.ns, filling with zeroes...
2016-01-06T13:46:32.191+0000 [IndexRebuilder] BackgroundJob starting: IndexRebuilder
2016-01-06T13:46:32.192+0000 [TTLMonitor] BackgroundJob starting: TTLMonitor
2016-01-06T13:46:32.192+0000 [PeriodicTaskRunner] BackgroundJob starting: PeriodicTaskRunner
2016-01-06T13:46:32.192+0000 [ClientCursorMonitor] BackgroundJob starting: ClientCursorMonitor
2016-01-06T13:46:32.245+0000 [initandlisten] mmf create /data/db/local.ns
2016-01-06T13:46:32.245+0000 [initandlisten] mmf finishOpening 0x7f107af15000 /data/db/local.ns len:16777216
2016-01-06T13:46:32.245+0000 [initandlisten] Database::_addNamespaceToCatalog ns: local.startup_log
2016-01-06T13:46:32.245+0000 [initandlisten] Database::_addNamespaceToCatalog ns: local.system.namespaces
2016-01-06T13:46:32.247+0000 [initandlisten] mmf create /data/db/local.0
2016-01-06T13:46:32.247+0000 [FileAllocator] allocating new datafile /data/db/local.0, filling with zeroes...
2016-01-06T13:46:32.248+0000 [FileAllocator] creating directory /data/db/_tmp
2016-01-06T13:46:32.248+0000 [FileAllocator] flushing directory /data/db
2016-01-06T13:46:32.250+0000 [FileAllocator] flushing directory /data/db
2016-01-06T13:46:32.252+0000 [FileAllocator] done allocating datafile /data/db/local.0, size: 16MB,  took 0.001 secs
2016-01-06T13:46:32.252+0000 [initandlisten] mmf finishOpening 0x7f1078f15000 /data/db/local.0 len:16777216
2016-01-06T13:46:32.252+0000 [initandlisten] ExtentManager::increaseStorageSize ns:local.system.namespaces desiredSize:8192 fromFreeList: 0 eloc: 0:2000
2016-01-06T13:46:32.252+0000 [initandlisten] _reuse extent was: now:local.system.namespaces
2016-01-06T13:46:32.253+0000 [initandlisten] ExtentManager::increaseStorageSize ns:local.startup_log desiredSize:10485760 fromFreeList: 0 eloc: 0:4000
2016-01-06T13:46:32.253+0000 [initandlisten] _reuse extent was: now:local.startup_log
2016-01-06T13:46:32.253+0000 [initandlisten] Database::_addNamespaceToCatalog ns: local.system.indexes
2016-01-06T13:46:32.254+0000 [initandlisten] ExtentManager::increaseStorageSize ns:local.system.indexes desiredSize:8192 fromFreeList: 0 eloc: 0:a04000
2016-01-06T13:46:32.254+0000 [initandlisten] _reuse extent was: now:local.system.indexes
2016-01-06T13:46:32.254+0000 [initandlisten] Database::_addNamespaceToCatalog ns: local.startup_log.$_id_
2016-01-06T13:46:32.254+0000 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }
2016-01-06T13:46:32.255+0000 [initandlisten] local.startup_log: clearing plan cache - collection info cache reset
2016-01-06T13:46:32.255+0000 [initandlisten] allocating new extent
2016-01-06T13:46:32.255+0000 [initandlisten] ExtentManager::increaseStorageSize ns:local.startup_log.$_id_ desiredSize:131072 fromFreeList: 0 eloc: 0:a06000
2016-01-06T13:46:32.255+0000 [initandlisten] _reuse extent was: now:local.startup_log.$_id_
2016-01-06T13:46:32.255+0000 [initandlisten] 	 added index to empty collection
2016-01-06T13:46:32.255+0000 [initandlisten] local.startup_log: clearing plan cache - collection info cache reset
2016-01-06T13:46:32.256+0000 [initandlisten] command local.$cmd command: create { create: "startup_log", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 numYields:0  reslen:37 74ms
2016-01-06T13:46:32.256+0000 [initandlisten] info PageFaultRetryableSection will not yield, already locked upon reaching
2016-01-06T13:46:32.265+0000 [initandlisten] insert local.startup_log ninserted:1 keyUpdates:0 numYields:0  9ms
2016-01-06T13:46:32.273+0000 [IndexRebuilder] IndexRebuilder::checkNS: local.startup_log
2016-01-06T13:46:32.274+0000 [IndexRebuilder] IndexRebuilder::checkNS: local.system.indexes
2016-01-06T13:46:32.274+0000 [IndexRebuilder] IndexRebuilder::checkNS: local.system.namespaces
2016-01-06T13:46:32.274+0000 [IndexRebuilder] checking complete
2016-01-06T13:46:32.275+0000 [initandlisten] waiting for connections on port 27017
2016-01-06T13:46:32.311+0000 [journal] journal WRITETODATAFILES 1
2016-01-06T13:46:32.312+0000 [journal] journal WRITETODATAFILES 2
2016-01-06T13:46:32.312+0000 [journal] journal WRITETODATAFILES 0.44ms
2016-01-06T13:46:33.233+0000 [journal] journal REMAPPRIVATEVIEW done startedAt: 0 n:3 1ms
2016-01-06T13:46:34.255+0000 [journal] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
2016-01-06T13:46:35.276+0000 [journal] journal REMAPPRIVATEVIEW done startedAt: 1 n:1 0ms
2016-01-06T13:46:36.298+0000 [journal] journal REMAPPRIVATEVIEW done startedAt: 2 n:1 0ms
2016-01-06T13:46:37.320+0000 [journal] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
2016-01-06T13:46:38.342+0000 [journal] journal REMAPPRIVATEVIEW done startedAt: 1 n:1 0ms
^C2016-01-06T13:46:38.453+0000 [signalProcessingThread] got signal 2 (Interrupt), will terminate after current cmd ends
2016-01-06T13:46:38.453+0000 [signalProcessingThread] now exiting
2016-01-06T13:46:38.453+0000 [signalProcessingThread] dbexit: 
2016-01-06T13:46:38.453+0000 [signalProcessingThread] shutdown: going to close listening sockets...
2016-01-06T13:46:38.453+0000 [signalProcessingThread] closing listening socket: 7
2016-01-06T13:46:38.453+0000 [signalProcessingThread] closing listening socket: 8
2016-01-06T13:46:38.453+0000 [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
2016-01-06T13:46:38.454+0000 [signalProcessingThread] shutdown: going to flush diaglog...
2016-01-06T13:46:38.454+0000 [signalProcessingThread] shutdown: going to close sockets...
2016-01-06T13:46:38.454+0000 [signalProcessingThread] shutdown: waiting for fs preallocator...
2016-01-06T13:46:38.454+0000 [signalProcessingThread] shutdown: lock for final commit...
2016-01-06T13:46:38.454+0000 [signalProcessingThread] shutdown: final commit...
2016-01-06T13:46:38.454+0000 [signalProcessingThread] journal REMAPPRIVATEVIEW done startedAt: 2 n:1 0ms
2016-01-06T13:46:38.458+0000 [signalProcessingThread] shutdown: closing all files...
2016-01-06T13:46:38.458+0000 [signalProcessingThread] mmf close /data/db/local.0
2016-01-06T13:46:38.458+0000 [signalProcessingThread] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
2016-01-06T13:46:38.458+0000 [signalProcessingThread] mmf close /data/db/local.ns
2016-01-06T13:46:38.458+0000 [signalProcessingThread] journal REMAPPRIVATEVIEW done startedAt: 1 n:1 0ms
2016-01-06T13:46:38.458+0000 [signalProcessingThread] mmf close 
2016-01-06T13:46:38.458+0000 [signalProcessingThread] closeAllFiles() finished
2016-01-06T13:46:38.458+0000 [signalProcessingThread] journalCleanup...
2016-01-06T13:46:38.458+0000 [signalProcessingThread] removeJournalFiles
2016-01-06T13:46:38.458+0000 [signalProcessingThread] flushing directory /data/db/journal
2016-01-06T13:46:38.460+0000 [signalProcessingThread] removeJournalFiles end
2016-01-06T13:46:38.460+0000 [signalProcessingThread] shutdown: removing fs lock...
2016-01-06T13:46:38.460+0000 [signalProcessingThread] shutdown: groupCommitMutex
2016-01-06T13:46:38.460+0000 [signalProcessingThread] dbexit: really exiting now

$ docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:20:08 UTC 2015
 OS/Arch:      linux/amd64
 
Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:20:08 UTC 2015
 OS/Arch:      linux/amd64

We also tried to use 2.6.1, 2.6.9 and 2.6.10 – all exhibiting the same issue.

Trying mongod version 3.2.0 seemed to work at first, but only for the WiredTiger storage engine, when using the MMAPv1 engine we ran into the same problem.

Trying to kill the process as root user or using docker kill had no effect, only rebooting the machine.



 Comments   
Comment by Kelsey Schubert [ 12/Jan/16 ]

Hi cbley,

I was unable to reproduce this behavior.

Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag.

For support running MongoDB with docker and apparmor. Please consider posting on the mongodb-users group

Thank you,
Thomas

Comment by Claudio Bley [ 07/Jan/16 ]

I tried to narrow it down further and was trying to rule out apparmor as a cause of the problem.

Unfortunately, I could not get the darn thing to stop. Using aa-complain /etc/apparmor.d/docker complained about an "invalid regex" error (https://github.com/docker/docker/issues/17783) and editing the file had not much impact as docker re-generates the file again.

Just guessing, but maybe the syntax error has something to do with this problem?!

Comment by Claudio Bley [ 06/Jan/16 ]

FTR, kernel version: 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Generated at Thu Feb 08 03:59:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.