Description
Hi,
We're running a single dev server mongodb (almost every parameter is default, except directoryPerDB), no replication, just for testing (we'll definitely put replication, even on dev, after those problems encountered...).
We regularly run very long process in order to generate on database cicr_results about 200 GB of "aggregated" datas with "custom" process (PHP / Javascript / Shell).
On May the 21st, we had to restart mongo.
A command launched using mongodb.admin was running for too long, and we we're unable to locate which process to kill in order to stop all the update this process was doing.
Just after Mongo restarted itself, data stored we're a lot different than before the restart.
In fact, we jumped for cicr_results database back in time at about 5 months before. Datas we're the same as on January, all inserts, updates done during those 5 months we're gone.
Here is the log surrounding the server restart :
2015-05-21T11:22:05.505+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38527 #416368 (140 connecti
|
ons now open)
|
2015-05-21T11:22:07.998+0200 I COMMAND [conn416368] going to kill op: op: 1934153368
|
2015-05-21T11:22:07.998+0200 I NETWORK [conn416368] end connection 127.0.0.1:38527 (139 connections now open)
|
2015-05-21T11:22:14.678+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38531 #416369 (140 connecti
|
ons now open)
|
2015-05-21T11:22:17.703+0200 I NETWORK [conn416369] end connection 127.0.0.1:38531 (139 connections now open)
|
2015-05-21T11:22:32.930+0200 I NETWORK [conn416365] end connection 127.0.0.1:38485 (138 connections now open)
|
2015-05-21T11:22:33.431+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38543 #416370 (139 connecti
|
ons now open)
|
2015-05-21T11:22:48.560+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38558 #416371 (140 connecti
|
ons now open)
|
2015-05-21T11:23:01.565+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38568 #416372 (141 connecti
|
ons now open)
|
2015-05-21T11:23:03.567+0200 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after curr
|
ent cmd ends
|
2015-05-21T11:23:03.567+0200 E QUERY [conn416202] JavaScript execution terminated
|
2015-05-21T11:23:03.578+0200 I CONTROL [signalProcessingThread] now exiting
|
2015-05-21T11:23:03.578+0200 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets...
|
2015-05-21T11:23:03.578+0200 I COMMAND [conn416202] command cicr_results.$cmd command: eval { $eval: CodeWScope( db.c
|
r.find({}).forEach(function (cur) {
|
db.cr.update({_id: cur._id}, {$set: {"s.ra": Math.floor(Math.random() * (1000 - 0)) + 0}})
|
}), {}), args: [] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:102 locks:{ Global: { acquireCount: { r:
|
2161, w: 9335668, W: 1 } }, MMAPV1Journal: { acquireCount: { w: 9326394 }, acquireWaitCount: { w: 33532 }, timeAcquiri
|
ngMicros: { w: 40647604 } }, Database: { acquireCount: { r: 2161, w: 9335668 } }, Collection: { acquireCount: { R: 216
|
1, W: 9335668 } } } 3496045ms
|
2015-05-21T11:23:03.578+0200 I NETWORK [signalProcessingThread] closing listening socket: 6
|
2015-05-21T11:23:03.578+0200 I NETWORK [signalProcessingThread] closing listening socket: 7
|
2015-05-21T11:23:03.578+0200 I NETWORK [conn416202] SocketException handling request, closing client connection: 9001
|
socket exception [SEND_ERROR] server [127.0.0.1:34736]
|
2015-05-21T11:23:03.579+0200 I NETWORK [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
|
2015-05-21T11:23:03.579+0200 I NETWORK [signalProcessingThread] shutdown: going to flush diaglog...
|
2015-05-21T11:23:03.579+0200 I NETWORK [signalProcessingThread] shutdown: going to close sockets...
|
2015-05-21T11:23:03.579+0200 I STORAGE [signalProcessingThread] shutdown: waiting for fs preallocator...
|
2015-05-21T11:23:03.579+0200 I STORAGE [signalProcessingThread] shutdown: final commit...
|
2015-05-21T11:23:03.579+0200 I NETWORK [conn416254] end connection 127.0.0.1:35853 (139 connections now open)
|
2015-05-21T11:23:03.580+0200 I NETWORK [conn333980] end connection 127.0.0.1:50058 (139 connections now open)
|
2015-05-21T11:23:03.580+0200 I NETWORK [conn416371] end connection 127.0.0.1:38558 (139 connections now open)
|
2015-05-21T11:23:03.614+0200 I JOURNAL [signalProcessingThread] journalCleanup...
|
2015-05-21T11:23:03.615+0200 I JOURNAL [signalProcessingThread] removeJournalFiles
|
2015-05-21T11:23:03.639+0200 I JOURNAL [signalProcessingThread] Terminating durability thread ...
|
2015-05-21T11:23:03.682+0200 I JOURNAL [journal writer] Journal writer thread stopped
|
2015-05-21T11:23:03.711+0200 I JOURNAL [durability] Durability thread stopped
|
2015-05-21T11:23:03.727+0200 I STORAGE [signalProcessingThread] shutdown: closing all files...
|
2015-05-21T11:23:05.086+0200 I - [signalProcessingThread] File Closing Progress: 36/248 14% (files)
|
2015-05-21T11:23:07.017+0200 I - [signalProcessingThread] File Closing Progress: 176/248 70% (files)
|
2015-05-21T11:23:09.094+0200 I - [signalProcessingThread] File Closing Progress: 215/248 86% (files)
|
2015-05-21T11:23:10.516+0200 I STORAGE [signalProcessingThread] closeAllFiles() finished
|
2015-05-21T11:23:10.517+0200 I STORAGE [signalProcessingThread] shutdown: removing fs lock...
|
2015-05-21T11:23:10.517+0200 I CONTROL [signalProcessingThread] dbexit: rc: 0
|
2015-05-21T11:23:13.519+0200 I CONTROL ***** SERVER RESTARTED *****
|
2015-05-21T11:23:13.573+0200 I JOURNAL [initandlisten] journal dir=/home/mongodb/journal
|
2015-05-21T11:23:13.574+0200 I JOURNAL [initandlisten] recover : no journal files present, no recovery needed
|
2015-05-21T11:23:13.600+0200 I JOURNAL [durability] Durability thread started
|
2015-05-21T11:23:13.600+0200 I JOURNAL [journal writer] Journal writer thread started
|
2015-05-21T11:23:13.602+0200 I CONTROL [initandlisten] MongoDB starting : pid=10206 port=27017 dbpath=/home/mongodb 64-bit host=dev-01-mngr.pcomperf.com
|
2015-05-21T11:23:13.603+0200 I CONTROL [initandlisten] db version v3.0.2
|
2015-05-21T11:23:13.603+0200 I CONTROL [initandlisten] git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f
|
2015-05-21T11:23:13.603+0200 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e 11 Feb 2013
|
2015-05-21T11:23:13.603+0200 I CONTROL [initandlisten] build info: Linux ip-10-171-120-232 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 BOOST_LIB_VERSION=1_49
|
2015-05-21T11:23:13.603+0200 I CONTROL [initandlisten] allocator: tcmalloc
|
2015-05-21T11:23:13.603+0200 I CONTROL [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "127.0.0.1" }, storage: { dbPath: "/home/mongodb", directoryPerDB: true }, systemLog: { destination: "file", logAppend: true, path: "/home/mongodb/mongod.log" } }
|
2015-05-21T11:23:14.921+0200 I NETWORK [initandlisten] waiting for connections on port 27017
|
2015-05-21T11:23:15.096+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38630 #1 (1 connection now open)
|
2015-05-21T11:23:17.441+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38631 #2 (2 connections now open)
|
2015-05-21T11:23:18.485+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38633 #3 (3 connections now open)
|
2015-05-21T11:23:24.054+0200 I NETWORK [conn3] end connection 127.0.0.1:38633 (2 connections now open)
|
2015-05-21T11:23:38.070+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38640 #4 (3 connections now open)
|
2015-05-21T11:24:01.209+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38663 #5 (4 connections now open)
|
2015-05-21T11:24:01.289+0200 I NETWORK [conn5] end connection 127.0.0.1:38663 (3 connections now open)
|
2015-05-21T11:24:01.364+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38664 #6 (4 connections now open)
|
2015-05-21T11:24:01.449+0200 I NETWORK [conn6] end connection 127.0.0.1:38664 (3 connections now open)
|
2015-05-21T11:24:01.535+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38665 #7 (4 connections now open)
|
2015-05-21T11:24:01.616+0200 I NETWORK [conn7] end connection 127.0.0.1:38665 (3 connections now open)
|
2015-05-21T11:24:15.646+0200 I NETWORK [conn4] end connection 127.0.0.1:38640 (2 connections now open)
|
2015-05-21T11:24:17.651+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38679 #8 (3 connections now open)
|
2015-05-21T11:24:23.115+0200 I NETWORK [conn8] end connection 127.0.0.1:38679 (2 connections now open)
|
2015-05-21T11:24:46.586+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38700 #9 (3 connections now open)
|
2015-05-21T11:25:01.823+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38714 #10 (4 connections now open)
|
2015-05-21T11:25:01.901+0200 I NETWORK [conn10] end connection 127.0.0.1:38714 (3 connections now open)
|
2015-05-21T11:25:01.997+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38715 #11 (4 connections now open)
|
2015-05-21T11:25:02.084+0200 I NETWORK [conn11] end connection 127.0.0.1:38715 (3 connections now open)
|
2015-05-21T11:25:02.172+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38716 #12 (4 connections now open)
|
2015-05-21T11:25:02.251+0200 I NETWORK [conn12] end connection 127.0.0.1:38716 (3 connections now open)
|
2015-05-21T11:25:24.216+0200 I COMMAND [conn9] CMD: drop segments_results.sr_555a15a04a620d867e52897e
|
2015-05-21T11:25:30.807+0200 I COMMAND [conn9] CMD: drop segments_results.sr_555c85f84a620d1279e70b06
|
2015-05-21T11:25:32.923+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:38745 #13 (4 connections now open)
|
As complementary information, cicr_results database, which is here the database in problem, is located on a 300 GB SSD drive, while all the other databases are actually stored on a 3TB Hard Disk.
At the moment where Mongo was restarted, Free Space on SSD was at about 4GB.
cicr_results is stored on SSD using directory symlinks :
lrwxrwxrwx 1 mongodb mongodb 31 mai 24 21:49 cicr_results -> /home/dbs/mongodb/cicr_results/
|
drwxr-xr-x 2 mongodb mongodb 4096 mai 30 09:59 journal
|
drwxr-xr-x 3 mongodb mongodb 4096 janv. 15 14:13 journal_buffer
|
-rwxr-xr-x 1 mongodb mongodb 6 mai 26 10:14 mongod.lock
|
-rwxr-xr-x 1 mongodb mongodb 1307059367 mai 30 10:04 mongod.log
|
We check the files written on disk, they seem to be up to date (some of the files were updated just before the crash happened).
We tried restarting the database once more, launching repairDatabase and validate, everything ran smooth but data were still the same as January's one.
Of course, replication may have prevented that, but it seems rather alarming that, even with journaling enabled, we've lost this much data.
Has anyone encountered something similar ?