[SERVER-14814] [slaveTracking] dbexception in groupCommit causing immediate shutdown: 0 assertion src/mongo/util/alignedbuilder.cpp:104 Created: 07/Aug/14  Updated: 25/Sep/14  Resolved: 25/Sep/14

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.6.0, 2.6.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Aram Hakobyan Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File Mongo_crush.rtf    
Operating System: ALL
Steps To Reproduce:

Unknown

Participants:

 Description   

All Replica set members(1 master and 2 slaves, but not hidden/delay slave) crashing, one after each other(during 1-2 min), with following exception

---- SOME WARNINGS WHICH PROBABLY NOT RELATED TO CRUSH ----

2014-08-07T01:03:54.927+0000 [conn5543] warning: DR102 too much data written uncommitted 315.277MB
2014-08-07T01:03:54.927+0000 [conn5543] warning: DR102 too much data written uncommitted 315.286MB
2014-08-07T01:03:54.927+0000 [conn5543] warning: DR102 too much data written uncommitted 315.294MB
2014-08-07T01:03:54.927+0000 [conn5543] warning: DR102 too much data written uncommitted 315.302MB
2014-08-07T01:03:54.927+0000 [conn5543] warning: DR102 too much data written uncommitted 315.31MB
2014-08-07T01:03:54.938+0000 [conn5543] 0x11da801 0xa5f13e 0xa5f36b 0xa5f3e2 0xa5f513 0xa5f587 0xa583ca 0xf0b99d 0xb6c803 0xb63c55 0xb64788 0x8d5506 0x8d585b 0x8b6905 0xc44d87 0xc504b6 0xb957b9 0xb9f9d8 0x76f2bf 0x1190afb 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11da801]
 /usr/bin/mongod(_ZN5mongo3dur9CommitJob4noteEPvi+0x28e) [0xa5f13e]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents8_unspoolEv+0x4b) [0xa5f36b]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents7unspoolEv+0x52) [0xa5f3e2]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents4pushERKNS0_11WriteIntentE+0x83) [0xa5f513]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl18declareWriteIntentEPvj+0x67) [0xa5f587]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl10writingPtrEPvj+0xa) [0xa583ca]
 /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7unindexEPNS_17IndexCatalogEntryENS_7DiskLocERKNS_7BSONObjES5_+0xad) [0xf0b99d]
 /usr/bin/mongod(_ZN5mongo18BtreeInterfaceImplINS_12BtreeData_V1EE7unindexEPNS_17IndexCatalogEntryENS_7DiskLocERKNS_7BSONObjES5_+0x53) [0xb6c803]
 /usr/bin/mongod(_ZN5mongo22BtreeBasedAccessMethod12removeOneKeyERKNS_7BSONObjERKNS_7DiskLocE+0x85) [0xb63c55]
 /usr/bin/mongod(_ZN5mongo22BtreeBasedAccessMethod6removeERKNS_7BSONObjERKNS_7DiskLocERKNS_19InsertDeleteOptionsEPl+0x118) [0xb64788]
 /usr/bin/mongod(_ZN5mongo12IndexCatalog14_unindexRecordEPNS_17IndexCatalogEntryERKNS_7BSONObjERKNS_7DiskLocEb+0x66) [0x8d5506]
 /usr/bin/mongod(_ZN5mongo12IndexCatalog13unindexRecordERKNS_7BSONObjERKNS_7DiskLocEb+0x7b) [0x8d585b]
 /usr/bin/mongod(_ZN5mongo10Collection14updateDocumentERKNS_7DiskLocERKNS_7BSONObjEbPNS_7OpDebugE+0xc65) [0x8b6905]
 /usr/bin/mongod(_ZN5mongo6updateERKNS_13UpdateRequestEPNS_7OpDebugEPNS_12UpdateDriverEPNS_14CanonicalQueryE+0x1387) [0xc44d87]
 /usr/bin/mongod(_ZN5mongo14UpdateExecutor7executeEv+0x66) [0xc504b6]
 /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x729) [0xb957b9]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xec8) [0xb9f9d8]
 /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9f) [0x76f2bf]
 /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4fb) [0x1190afb]
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.318MB
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.326MB
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.335MB
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.343MB
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.351MB
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.359MB
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.367MB
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.372MB
2014-08-07T01:03:54.938+0000 [conn5543] warning: DR102 too much data written uncommitted 315.38MB
2014-08-07T01:03:54.959+0000 [initandlisten] connection accepted from 75.126.39.72:64145 #6570 (222 connections now open)
2014-08-07T01:03:55.797+0000 [initandlisten] connection accepted from 75.126.39.72:64170 #6571 (223 connections now open)
2014-08-07T01:03:55.819+0000 [initandlisten] connection accepted from 75.126.39.72:64173 #6572 (224 connections now open)

……………… EXCEPTION AFTER WHICH DB CRASHES ……………….

2014-08-07T01:03:56.493+0000 [conn5543] update OUT_TBL.users query: { _id: ObjectId('514130ae428263981400b84a') } update: { $pushAll: { photos: [ ObjectId('53e2d07979ada9a019000157') ] }, $set: { photos_count: 244, updated: new Date(1407373433245) }, $inc: { __v: 1 } } nscanned:1 nscannedObjects:1 nmoved:1 nMatched:1 nModified:1 keyUpdates:0 numYields:0 locks(micros) w:3237645 3237ms
2014-08-07T01:03:56.496+0000 [conn5547] query OUT_TBL.users query: { $query: { following: ObjectId('514130ae428263981400b84a') }, $readPreference: { mode: "primary" } } planSummary: IXSCAN { following: 1.0 } cursorid:52180400515 ntoreturn:0 ntoskip:0 nscanned:102 nscannedObjects:101 keyUpdates:0 numYields:1 locks(micros) r:6602 nreturned:101 reslen:3450 3242ms
2014-08-07T01:03:57.110+0000 [slaveTracking] warning assertion failure a <= 256*1024*1024 src/mongo/util/alignedbuilder.cpp 103
2014-08-07T01:03:57.121+0000 [slaveTracking] 0x11da801 0x117c9f9 0x116091f 0x115ec53 0xa6920f 0xa694c4 0xa5bee7 0xa5c58a 0xa59799 0xa59a36 0xa193e9 0xa1ccb8 0xa1ced3 0xb95751 0xb9f9d8 0xba038f 0x7b4925 0x7b0d4d 0xec725a 0x11625f2 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11da801]
 /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x117c9f9]
 /usr/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x17f) [0x116091f]
 /usr/bin/mongod(_ZN5mongo14AlignedBuilder14growReallocateEj+0x63) [0x115ec53]
 /usr/bin/mongod() [0xa6920f]
 /usr/bin/mongod(_ZN5mongo3dur13PREPLOGBUFFERERNS0_11JSectHeaderERNS_14AlignedBuilderE+0x214) [0xa694c4]
 /usr/bin/mongod() [0xa5bee7]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl9commitNowEv+0x1a) [0xa5c58a]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl16_aCommitIsNeededEv+0x79) [0xa59799]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl14commitIfNeededEb+0x56) [0xa59a36]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWrite7lockTopERNS_9LockStateE+0x59) [0xa193e9]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWrite6lockDBERKSs+0x188) [0xa1ccb8]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWriteC1ERKNS_10StringDataE+0x63) [0xa1ced3]
 /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x6c1) [0xb95751]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xec8) [0xb9f9d8]
 /usr/bin/mongod(_ZN5mongo14DBDirectClient3sayERNS_7MessageEbPSs+0xaf) [0xba038f]
 /usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEi+0x2d5) [0x7b4925]
 /usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb+0x6d) [0x7b0d4d]
 /usr/bin/mongod(_ZN5mongo13SlaveTracking3runEv+0xa9a) [0xec725a]
 /usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xd2) [0x11625f2]
2014-08-07T01:03:57.413+0000 [slaveTracking] rate limiting wassert
2014-08-07T01:03:58.658+0000 [slaveTracking] Assertion failure a <= 512*1024*1024 src/mongo/util/alignedbuilder.cpp 104
2014-08-07T01:03:58.669+0000 [slaveTracking] 0x11da801 0x117c9f9 0x1160c1e 0x115ec6d 0xa6920f 0xa694c4 0xa5bee7 0xa5c58a 0xa59799 0xa59a36 0xa193e9 0xa1ccb8 0xa1ced3 0xb95751 0xb9f9d8 0xba038f 0x7b4925 0x7b0d4d 0xec725a 0x11625f2 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11da801]
 /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x117c9f9]
 /usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0x17e) [0x1160c1e]
 /usr/bin/mongod(_ZN5mongo14AlignedBuilder14growReallocateEj+0x7d) [0x115ec6d]
 /usr/bin/mongod() [0xa6920f]
 /usr/bin/mongod(_ZN5mongo3dur13PREPLOGBUFFERERNS0_11JSectHeaderERNS_14AlignedBuilderE+0x214) [0xa694c4]
 /usr/bin/mongod() [0xa5bee7]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl9commitNowEv+0x1a) [0xa5c58a]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl16_aCommitIsNeededEv+0x79) [0xa59799]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl14commitIfNeededEb+0x56) [0xa59a36]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWrite7lockTopERNS_9LockStateE+0x59) [0xa193e9]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWrite6lockDBERKSs+0x188) [0xa1ccb8]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWriteC1ERKNS_10StringDataE+0x63) [0xa1ced3]
 /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x6c1) [0xb95751]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xec8) [0xb9f9d8]
 /usr/bin/mongod(_ZN5mongo14DBDirectClient3sayERNS_7MessageEbPSs+0xaf) [0xba038f]
 /usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEi+0x2d5) [0x7b4925]
 /usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb+0x6d) [0x7b0d4d]
 /usr/bin/mongod(_ZN5mongo13SlaveTracking3runEv+0xa9a) [0xec725a]
 /usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xd2) [0x11625f2]
2014-08-07T01:03:58.669+0000 [slaveTracking] dbexception in groupCommit causing immediate shutdown: 0 assertion src/mongo/util/alignedbuilder.cpp:104
2014-08-07T01:03:58.669+0000 [slaveTracking] SEVERE: gc1
2014-08-07T01:03:58.678+0000 [slaveTracking] SEVERE: Got signal: 6 (Aborted).
Backtrace:0x11da801 0x11d9bde 0x7f4737bea4a0 0x7f4737bea425 0x7f4737bedb8b 0xb9180b 0xa5c32d 0xa5c58a 0xa59799 0xa59a36 0xa193e9 0xa1ccb8 0xa1ced3 0xb95751 0xb9f9d8 0xba038f 0x7b4925 0x7b0d4d 0xec725a 0x11625f2 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11da801]
 /usr/bin/mongod() [0x11d9bde]
 /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f4737bea4a0]
 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f4737bea425]
 /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f4737bedb8b]
 /usr/bin/mongod(_ZN5mongo10mongoAbortEPKc+0x6b) [0xb9180b]
 /usr/bin/mongod() [0xa5c32d]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl9commitNowEv+0x1a) [0xa5c58a]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl16_aCommitIsNeededEv+0x79) [0xa59799]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl14commitIfNeededEb+0x56) [0xa59a36]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWrite7lockTopERNS_9LockStateE+0x59) [0xa193e9]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWrite6lockDBERKSs+0x188) [0xa1ccb8]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWriteC1ERKNS_10StringDataE+0x63) [0xa1ced3]
 /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x6c1) [0xb95751]
 /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xec8) [0xb9f9d8]
 /usr/bin/mongod(_ZN5mongo14DBDirectClient3sayERNS_7MessageEbPSs+0xaf) [0xba038f]
 /usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEi+0x2d5) [0x7b4925]
 /usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb+0x6d) [0x7b0d4d]
 /usr/bin/mongod(_ZN5mongo13SlaveTracking3runEv+0xa9a) [0xec725a]
 /usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xd2) [0x11625f2]
 



 Comments   
Comment by Ramon Fernandez Marina [ 25/Sep/14 ]

Thanks for letting us know aram.hakobyan. I'm going to close this ticket then; if you see the same issue again feel free to reopen it. If you run into any other issues, please open a new ticket.

Regards,
Ramón.

Comment by Aram Hakobyan [ 25/Sep/14 ]

Hi Ramon,

we have found out that we are updating/inserting big object, more than allowed, and crushes occurred due to that. After correcting the issue, we have not seen any crushes yet.

Thanks for support.
Aram

Comment by Ramon Fernandez Marina [ 24/Sep/14 ]

Hi aram.hakobyan, we haven't heard back from you for some time. If this is still an issue for you, can you upload the logs requested above and provide more information on whether there were any index builds in progress when you experienced this issue?

Thanks,
Ramón.

Comment by Ramon Fernandez Marina [ 02/Sep/14 ]

aram.hakobyan, apologies for the late reply. These logs show how the journal commit keeps growing, and at some point it goes over the 512MB limit. In order to understand what's happening, and to determine whether there's a bug in the server, we'll need further information. Can you please provide the logs from the primary from startup until the "Got signal: 6 (Aborted)" message? Also, what's the kind of load that triggers this behavior? Can you provide details on whether there are any index builds in progress when you start seeing "DR102 too much data written uncommitted" messages?

Thanks,
Ramón.

Generated at Thu Feb 08 03:36:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.