[SERVER-24321] Server Crash after newly allocated datafile Created: 31/May/16  Updated: 01/Jun/16  Resolved: 01/Jun/16

Status: Closed
Project: Core Server
Component/s: GridFS, MMAPv1
Affects Version/s: 2.6.11
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Jeff Oliver Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6.7
Data stored on XFS
17TB disk space allocated, only 84% full
Server has 24GB ram


Participants:

 Description   

I'm having trouble deciphering the reason for this crash.

2016-05-28T02:48:01.609-0700 [FileAllocator] allocating new datafile /srv/db/mongo/data/loadable_store.6760, filling 
with zeroes...
2016-05-28T02:48:01.633-0700 [FileAllocator] done allocating datafile /srv/db/mongo/data/loadable_store.6760, size: 2
047MB,  took 0.023 secs
2016-05-28T02:48:01.636-0700 [conn56195] insert loadable_store.fs.chunks query: { files_id: ObjectId('574968e3ca6d39a3d5125868'), n: 20, data: BinData(0, C639B837E2C6FD1E1F2C2CF7B581106CF062C22F2F031566A47725A393588F36DA2D342159A5F601E2B449494D9C39E9F487E806BA11322E55A407121DD77A00E7F01CCA9B2C...), _id: ObjectId('574968e3ca6d39a3d512587d') } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:110316411 110316ms
2016-05-28T02:48:01.636-0700 [conn56195] command loadable_store.$cmd command: insert { insert: "fs.chunks", documents: [ { files_id: ObjectId('574968e3ca6d39a3d5125868'), n: 20, data: BinData(0, C639B837E2C6FD1E1F2C2CF7B581106CF062C22F2F031566A47725A393588F36DA2D342159A5F601E2B449494D9C39E9F487E806BA11322E55A407121DD77A00E7F01CCA9B2C...), _id: ObjectId('574968e3ca6d39a3d512587d') } ], ordered: true, writeConcern: { w: 1 } } keyUpdates:0 numYields:0 locks(micros) w:67 reslen:40 110316ms
2016-05-28T02:48:09.457-0700 [journal] warning assertion failure a <= 256*1024*1024 src/mongo/util/alignedbuilder.cpp 103
2016-05-28T02:48:09.556-0700 [journal] 0x121d451 0x11bcbb9 0x119f8fe 0x119f063 0xa794bf 0xa79774 0xa694d3 0xa6bc39 0xa6bf77 0x1262219 0x7f1898b00a51 0x7f1897ea593d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x121d451]
 /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x11bcbb9]
 /usr/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x16e) [0x119f8fe]
 /usr/bin/mongod(_ZN5mongo14AlignedBuilder14growReallocateEj+0x63) [0x119f063]
 /usr/bin/mongod() [0xa794bf]
 /usr/bin/mongod(_ZN5mongo3dur13PREPLOGBUFFERERNS0_11JSectHeaderERNS_14AlignedBuilderE+0x214) [0xa79774]
 /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0xa3) [0xa694d3]
 /usr/bin/mongod() [0xa6bc39]
 /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x297) [0xa6bf77]
 /usr/bin/mongod() [0x1262219]
 /lib64/libpthread.so.0(+0x7a51) [0x7f1898b00a51]
 /lib64/libc.so.6(clone+0x6d) [0x7f1897ea593d]
2016-05-28T02:48:10.206-0700 [journal] rate limiting wassert
2016-05-28T02:48:13.558-0700 [journal] Assertion failure a <= 512*1024*1024 src/mongo/util/alignedbuilder.cpp 104
2016-05-28T02:48:13.574-0700 [journal] 0x121d451 0x11bcbb9 0x11a1124 0x119f07d 0xa794bf 0xa79774 0xa694d3 0xa6bc39 0xa6bf77 0x1262219 0x7f1898b00a51 0x7f1897ea593d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x121d451]
 /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x11bcbb9]
 /usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0x194) [0x11a1124]
 /usr/bin/mongod(_ZN5mongo14AlignedBuilder14growReallocateEj+0x7d) [0x119f07d]
 /usr/bin/mongod() [0xa794bf]
 /usr/bin/mongod(_ZN5mongo3dur13PREPLOGBUFFERERNS0_11JSectHeaderERNS_14AlignedBuilderE+0x214) [0xa79774]
 /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0xa3) [0xa694d3]
 /usr/bin/mongod() [0xa6bc39]
 /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x297) [0xa6bf77]
 /usr/bin/mongod() [0x1262219]
 /lib64/libpthread.so.0(+0x7a51) [0x7f1898b00a51]
 /lib64/libc.so.6(clone+0x6d) [0x7f1897ea593d]
2016-05-28T02:48:13.584-0700 [journal] dbexception in groupCommitLL causing immediate shutdown: 0 assertion src/mongo/util/alignedbuilder.cpp:104
2016-05-28T02:48:13.584-0700 [journal] SEVERE: dur1
2016-05-28T02:48:13.584-0700 [conn56192] remove loadable_store.fs.chunks query: { _id: ObjectId('574968e3ca6d39a3d512587e') } ndeleted:0 keyUpdates:0 numYields:0 locks(micros) w:69 11945ms
2016-05-28T02:48:13.584-0700 [conn56192] command loadable_store.$cmd command: delete { delete: "fs.chunks", deletes: [ { q: { _id: ObjectId('574968e3ca6d39a3d512587e') }, limit: 0 } ], ordered: true, writeConcern: { w: 1 } } keyUpdates:0 numYields:0  reslen:40 11945ms
2016-05-28T02:48:13.594-0700 [conn56221] update wf_file_storage.fs.chunks query: { _id: ObjectId('57496953671615d0d649e699') } update: { files_id: ObjectId('57496953671615d0d649e698'), n: 0, data: BinData(0, 1F8B0800000000000003ECB8075453DDB72F8A20A028161050A409D2114808094D308400A1871E3AA184164A802420458A2220D24191A280F42A20557A9122BD8A3401E9BDF7...), _id: ObjectId('57496953671615d0d649e699') } nscanned:0 nscannedObjects:0 nMatched:1 nModified:0 upsert:1 keyUpdates:0 numYields:0 locks(micros) w:9152 10019ms
2016-05-28T02:48:13.594-0700 [conn56221] command wf_file_storage.$cmd command: update { update: "fs.chunks", updates: [ { q: { _id: ObjectId('57496953671615d0d649e699') }, u: { files_id: ObjectId('57496953671615d0d649e698'), n: 0, data: BinData(0, 1F8B0800000000000003ECB8075453DDB72F8A20A028161050A409D2114808094D308400A1871E3AA184164A802420458A2220D24191A280F42A20557A9122BD8A3401E9BDF7...), _id: ObjectId('57496953671615d0d649e699') }, upsert: true, multi: false } ], ordered: true, writeConcern: { w: 1 } } keyUpdates:0 numYields:0  reslen:106 10019ms
2016-05-28T02:48:13.596-0700 [journal] SEVERE: Got signal: 6 (Aborted).
Backtrace:0x121d451 0x121c82e 0x7f1897def6a0 0x7f1897def625 0x7f1897df0e05 0xba749b 0xa69a16 0xa6bc39 0xa6bf77 0x1262219 0x7f1898b00a51 0x7f1897ea593d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x121d451]
 /usr/bin/mongod() [0x121c82e]
 /lib64/libc.so.6(+0x326a0) [0x7f1897def6a0]
 /lib64/libc.so.6(gsignal+0x35) [0x7f1897def625]
 /lib64/libc.so.6(abort+0x175) [0x7f1897df0e05]
 /usr/bin/mongod(_ZN5mongo10mongoAbortEPKc+0x6b) [0xba749b]
 /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x5e6) [0xa69a16]
 /usr/bin/mongod() [0xa6bc39]
 /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x297) [0xa6bf77]
 /usr/bin/mongod() [0x1262219]
 /lib64/libpthread.so.0(+0x7a51) [0x7f1898b00a51]
 /lib64/libc.so.6(clone+0x6d) [0x7f1897ea593d]
2016-05-28T03:04:58.207-0700 ***** SERVER RESTARTED *****

Doing a little searching, i see this error occur frequently due to the journal getting backed up, where the journal has too much data that is unflushed to the datafiles. I don't see that DR102 error in my log, so I'm curious why I got this error and what I could do to prevent it.

As far as my configuration, there is only 1 server, no replication, no sharding, that we use for storing large files using gridfs. The database size is somewhere in the 14TB range. We use node.js for the bulk of the code we write that communicates with the mongo server.

The events leading up to this crash are pretty simple. It crashed during a time where typically nothing is happening to the server, however, we have a build process that ended up sending 16 150MB files to be stored in mongo each about 30 seconds apart. I don't typically have issues with this kind of load, but on this occasion I did.

So what causes it? Memory allocation? Fragmentation? Is there something I can do to prevent it?



 Comments   
Comment by Kelsey Schubert [ 01/Jun/16 ]

Hi kaiserfro,

To resolve this issue, I would recommend that you upgrade to MongoDB 3.2.6 which has many improvements including SERVER-17501.

Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group. See also our Technical Support page for additional support resources.

Thank you,
Thomas

Generated at Thu Feb 08 04:06:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.