[SERVER-20552] MongoDB server crashed suddenly Created: 22/Sep/15  Updated: 18/Nov/15  Resolved: 18/Nov/15

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.6.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Yang Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: Windows
Steps To Reproduce:

Unfortunately, I'm not aware of the cause/steps of this issue.

Participants:

 Description   

I'm running a single instance of MongoDB 2.6 in Win2012SP2 64bit, 40GB memory, a few hundred GB of free disk space.

I have experienced sudden server crash twice in last one month. The mongodb instance just exits all of a sudden and have to be started again in service console to be available.

I'm accessing the database using native node.js drive (Mongo). It serves a web application with 50 connections pooled, and X number of connections open and closed for from various job processes that kicks off and exit from time to time.

Here is a section of the log before and after the server crash. Kindly advise.

2015-09-22T18:00:34.424+0800 [conn5532287] command DB1.$cmd command: aggregate { aggregate: "Collection1", pipeline: [ { $match: { id: { $ne: null }, agent: null } }, { $group: { _id: "$status", total: { $sum: 1 } } } ] } keyUpdates:0 numYields:2722 locks(micros) r:21611137 reslen:233 17627ms
2015-09-22T18:00:34.471+0800 [conn5594217] remove DB2.Collection2 query: { id: "ID1", run_date: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) } } ndeleted:3 keyUpdates:0 numYields:7 locks(micros) w:1667820 1002ms
2015-09-22T18:00:34.471+0800 [conn5594217] command DB2.$cmd command: delete { delete: "Collection2", ordered: true, deletes: [ { q: { id: "ID1", run_date: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) } }, limit: 0 } ] } keyUpdates:0 numYields:0  reslen:80 1002ms
2015-09-22T18:00:34.487+0800 [initandlisten] connection accepted from 192.168.X.X1:50497 #5594218 (1310 connections now open)
2015-09-22T18:00:34.487+0800 [initandlisten] connection accepted from 192.168.X.X1:50498 #5594219 (1311 connections now open)
2015-09-22T18:00:34.768+0800 [conn5594219] query DB3.Collection3 query: { created_time: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) }, id: "ID1" } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:159028 nscannedObjects:159028 keyUpdates:0 numYields:0 locks(micros) r:279074 nreturned:2 reslen:2318 279ms
2015-09-22T18:00:34.768+0800 [conn5594219] end connection 192.168.X.X1:50498 (1310 connections now open)
2015-09-22T18:00:34.768+0800 [conn5594218] end connection 192.168.X.X1:50497 (1309 connections now open)
2015-09-22T18:00:34.815+0800 [initandlisten] connection accepted from 192.168.X.X1:50503 #5594220 (1310 connections now open)
2015-09-22T18:00:34.815+0800 [initandlisten] connection accepted from 192.168.X.X1:50504 #5594221 (1311 connections now open)
2015-09-22T18:00:34.815+0800 [conn5594221] end connection 192.168.X.X1:50504 (1310 connections now open)
2015-09-22T18:00:34.815+0800 [conn5594220] end connection 192.168.X.X1:50503 (1309 connections now open)
2015-09-22T18:00:34.815+0800 [conn5594217] end connection 192.168.X.X1:50496 (1309 connections now open)
2015-09-22T18:00:34.815+0800 [conn5594216] end connection 192.168.X.X1:50495 (1307 connections now open)
2015-09-22T18:00:34.831+0800 [initandlisten] connection accepted from 192.168.X.X1:50505 #5594222 (1308 connections now open)
2015-09-22T18:00:34.831+0800 [initandlisten] connection accepted from 192.168.X.X1:50506 #5594223 (1309 connections now open)
2015-09-22T18:00:35.418+0800 [conn5594223] remove DB2.Collection2 query: { id: "ID2", run_date: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) } } ndeleted:0 keyUpdates:0 numYields:5 locks(micros) w:959777 572ms
2015-09-22T18:00:35.418+0800 [conn5594223] command DB2.$cmd command: delete { delete: "Collection2", ordered: true, deletes: [ { q: { id: "ID2", run_date: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) } }, limit: 0 } ] } keyUpdates:0 numYields:0  reslen:80 572ms
2015-09-22T18:00:35.422+0800 [initandlisten] connection accepted from 192.168.X.X1:50507 #5594224 (1310 connections now open)
2015-09-22T18:00:35.426+0800 [initandlisten] connection accepted from 192.168.X.X1:50508 #5594225 (1311 connections now open)
2015-09-22T18:00:35.740+0800 [conn5594225] query DB3.Collection3 query: { created_time: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) }, id: "ID2" } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:159028 nscannedObjects:159028 keyUpdates:0 numYields:1211 locks(micros) r:599760 nreturned:0 reslen:20 318ms
2015-09-22T18:00:35.740+0800 [conn5582872] update DB3.Collection3 query: { _id: "ID3" } update: { $set: { #SOME_JSON_DOCUMENT_OF_SMALL_SIZE# } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 fastmod:1 keyUpdates:0 numYields:0 locks(micros) w:176 241ms
2015-09-22T18:00:35.740+0800 [conn5594225] end connection 192.168.X.X1:50508 (1310 connections now open)
2015-09-22T18:00:35.740+0800 [conn5594224] end connection 192.168.X.X1:50507 (1310 connections now open)
2015-09-22T18:00:35.740+0800 [initandlisten] connection accepted from 192.168.X.X1:50509 #5594226 (1310 connections now open)
2015-09-22T18:00:35.756+0800 [initandlisten] connection accepted from 192.168.X.X1:50510 #5594227 (1311 connections now open)
2015-09-22T18:00:35.772+0800 [conn5582872] command DB3.$cmd command: update { $msg: "query not recording (too large)" } keyUpdates:0 numYields:0  reslen:95 337ms
2015-09-22T18:00:35.928+0800 [conn5321938] CMD: dropIndexes DB1.Collection1
2015-09-22T18:00:36.053+0800 [conn5582874] command DB4.$cmd command: update { $msg: "query not recording (too large)" } keyUpdates:0 numYields:0  reslen:95 266ms
2015-09-22T18:00:36.053+0800 [conn5582878] command DB4.$cmd command: update { $msg: "query not recording (too large)" } keyUpdates:0 numYields:0  reslen:95 212ms
2015-09-22T18:00:36.490+0800 [conn5594227] remove DB2.Collection2 query: { id: "ID4", run_date: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) } } ndeleted:0 keyUpdates:0 numYields:22 locks(micros) w:826032 721ms
2015-09-22T18:00:36.490+0800 [conn5594227] command DB2.$cmd command: delete { delete: "Collection2", ordered: true, deletes: [ { q: { id: "ID4", run_date: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) } }, limit: 0 } ] } keyUpdates:0 numYields:0  reslen:80 721ms
2015-09-22T18:00:36.490+0800 [initandlisten] connection accepted from 192.168.X.X1:50511 #5594228 (1312 connections now open)
2015-09-22T18:00:36.490+0800 [initandlisten] connection accepted from 192.168.X.X1:50512 #5594229 (1313 connections now open)
2015-09-22T18:00:36.787+0800 [conn5594229] query DB3.Collection3 query: { created_time: { $gte: new Date(1442910900000), $lt: new Date(1442911200000) }, id: "ID4" } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:159028 nscannedObjects:159028 keyUpdates:0 numYields:2 locks(micros) r:391576 nreturned:0 reslen:20 302ms
2015-09-22T18:00:36.787+0800 [conn5594229] end connection 192.168.X.X1:50512 (1312 connections now open)
2015-09-22T18:00:36.787+0800 [conn5594228] end connection 192.168.X.X1:50511 (1311 connections now open)
2015-09-22T18:00:36.803+0800 [initandlisten] connection accepted from 192.168.X.X1:50513 #5594230 (1312 connections now open)
2015-09-22T18:00:36.803+0800 [initandlisten] connection accepted from 192.168.X.X1:50514 #5594231 (1313 connections now open)
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\util\stacktrace.cpp(169)                      mongo::printStackTrace+0x43
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\util\signal_handlers.cpp(107)                 mongo::`anonymous namespace'::abruptQuit+0xf2
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  f:\dd\vctools\crt_bld\self_64_amd64\crt\src\winsig.c(593)   raise+0x1ed
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  f:\dd\vctools\crt_bld\self_64_amd64\crt\src\abort.c(81)     abort+0x18
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  f:\dd\vctools\crt_bld\self_64_amd64\crt\src\purevirt.c(54)  _purecall+0x18
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\index\btree_interface.cpp(162)             mongo::BtreeInterfaceImpl<mongo::BtreeData_V1>::keyAt+0x65
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\index\btree_index_cursor.cpp(260)          mongo::BtreeIndexCursor::isSavedPositionValid+0x38
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\index\btree_index_cursor.cpp(203)          mongo::BtreeIndexCursor::restorePosition+0x79
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\exec\index_scan.cpp(229)                   mongo::IndexScan::recoverFromYield+0x55
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\query\plan_executor.cpp(88)                mongo::PlanExecutor::getNext+0x133
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\pipeline\document_source_cursor.cpp(89)    mongo::DocumentSourceCursor::loadBatch+0x129
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\pipeline\document_source_cursor.cpp(57)    mongo::DocumentSourceCursor::getNext+0x6c
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\pipeline\document_source_unwind.cpp(129)   mongo::DocumentSourceUnwind::getNext+0xbe
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\pipeline\document_source_group.cpp(359)    mongo::DocumentSourceGroup::populate+0x7c
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\pipeline\document_source_group.cpp(53)     mongo::DocumentSourceGroup::getNext+0x76
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\pipeline\pipeline.cpp(501)                 mongo::Pipeline::run+0xf3
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\commands\pipeline_command.cpp(359)         mongo::PipelineCommand::run+0x834
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\dbcommands.cpp(1357)                       mongo::_execCommand+0x5e
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\dbcommands.cpp(1545)                       mongo::Command::execCommand+0xd45
2015-09-22T18:00:43.638+0800 [conn5525704] mongod.exe  ...\src\mongo\db\dbcommands.cpp(1677)                       mongo::_runCommands+0x4a7
2015-09-22T18:00:43.638+0800 [conn5525704] SEVERE: Got signal: 22 (SIGABRT).
Backtrace:
2015-09-22T18:02:27.350+0800 ***** SERVER RESTARTED *****
2015-09-22T18:02:27.412+0800 Trying to start Windows service 'MongoDBDataInstance1'
2015-09-22T18:02:27.412+0800 [DataFileSync] flushing diag log
2015-09-22T18:02:27.412+0800 Service running
2015-09-22T18:02:27.616+0800 [initandlisten] MongoDB starting : pid=10508 port=27017 dbpath=E:\mongodb\D1\data\ 64-bit host=BISHOP
2015-09-22T18:02:27.616+0800 [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2015-09-22T18:02:27.616+0800 [initandlisten] db version v2.6.3
2015-09-22T18:02:27.616+0800 [initandlisten] git version: 255f67a66f9603c59380b2a389e386910bbb52cb
2015-09-22T18:02:27.616+0800 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
2015-09-22T18:02:27.616+0800 [initandlisten] allocator: system
2015-09-22T18:02:27.616+0800 [initandlisten] options: { config: "C:\Program Files\MongoDB 2.6 Standard\mongod1.cfg", diaglog: 3, net: { http: { RESTInterfaceEnabled: true, enabled: true }, port: 27017 }, processManagement: { windowsService: { displayName: "MongoDB Data Instance 1", serviceName: "MongoDBDataInstance1" } }, replication: { oplogSizeMB: 128, replSet: "rs0" }, service: true, storage: { dbPath: "E:\mongodb\D1\data\", journal: { enabled: true }, smallFiles: true }, systemLog: { destination: "file", logAppend: true, path: "E:\mongodb\D1\log\mongo.log" } }
2015-09-22T18:02:28.030+0800 [initandlisten] journal dir=E:\mongodb\D1\data\journal
2015-09-22T18:02:28.030+0800 [initandlisten] recover begin
2015-09-22T18:02:28.030+0800 [initandlisten] recover lsn: 225358909
2015-09-22T18:02:28.030+0800 [initandlisten] recover E:\mongodb\D1\data\journal\j._91
2015-09-22T18:02:28.040+0800 [initandlisten] recover skipping application of section seq:224456101 < lsn:225358909
2015-09-22T18:02:28.243+0800 [initandlisten] recover skipping application of section seq:224494782 < lsn:225358909
2015-09-22T18:02:28.399+0800 [initandlisten] recover skipping application of section seq:224534391 < lsn:225358909
2015-09-22T18:02:28.540+0800 [initandlisten] recover skipping application of section seq:224574097 < lsn:225358909
2015-09-22T18:02:28.774+0800 [initandlisten] recover skipping application of section seq:224612900 < lsn:225358909
2015-09-22T18:02:29.008+0800 [initandlisten] recover skipping application of section seq:224651718 < lsn:225358909
2015-09-22T18:02:29.133+0800 [initandlisten] recover skipping application of section seq:224691165 < lsn:225358909
2015-09-22T18:02:29.337+0800 [initandlisten] recover skipping application of section seq:224730870 < lsn:225358909
2015-09-22T18:02:29.524+0800 [initandlisten] recover skipping application of section seq:224770773 < lsn:225358909
2015-09-22T18:02:30.257+0800 [initandlisten] recover skipping application of section more...
2015-09-22T18:02:30.960+0800 [initandlisten] recover E:\mongodb\D1\data\journal\j._92
2015-09-22T18:02:32.578+0800 [initandlisten] recover cleaning up
2015-09-22T18:02:32.578+0800 [initandlisten] removeJournalFiles
2015-09-22T18:02:32.611+0800 [initandlisten] recover done



 Comments   
Comment by Ramon Fernandez Marina [ 18/Nov/15 ]

Apologies for the long delay in getting back to you xuguoming@nexlabs.com. My understanding of the error you're getting is that it could be triggered by data corruption in an index, which may be caused by a faulty storage layer. You may want to run validate() on your collections to make sure your data is healthy, as well as check the integrity of your storage layer.

I'd also recommend you upgrade MongoDB to 2.6.11 (which is the latest version in the v2.6 branch at the time of this writing), which contains many improvements over 2.6.3.

Regards,
Ramón.

Comment by Yang [ 28/Sep/15 ]

@Sam Kleinman, any advice?

Generated at Thu Feb 08 03:54:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.