Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24321

Server Crash after newly allocated datafile

    • Type: Icon: Question Question
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.11
    • Component/s: GridFS, MMAPv1
    • Labels:
      None
    • Environment:
      CentOS 6.7
      Data stored on XFS
      17TB disk space allocated, only 84% full
      Server has 24GB ram

      I'm having trouble deciphering the reason for this crash.

      2016-05-28T02:48:01.609-0700 [FileAllocator] allocating new datafile /srv/db/mongo/data/loadable_store.6760, filling 
      with zeroes...
      2016-05-28T02:48:01.633-0700 [FileAllocator] done allocating datafile /srv/db/mongo/data/loadable_store.6760, size: 2
      047MB,  took 0.023 secs
      2016-05-28T02:48:01.636-0700 [conn56195] insert loadable_store.fs.chunks query: { files_id: ObjectId('574968e3ca6d39a3d5125868'), n: 20, data: BinData(0, C639B837E2C6FD1E1F2C2CF7B581106CF062C22F2F031566A47725A393588F36DA2D342159A5F601E2B449494D9C39E9F487E806BA11322E55A407121DD77A00E7F01CCA9B2C...), _id: ObjectId('574968e3ca6d39a3d512587d') } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:110316411 110316ms
      2016-05-28T02:48:01.636-0700 [conn56195] command loadable_store.$cmd command: insert { insert: "fs.chunks", documents: [ { files_id: ObjectId('574968e3ca6d39a3d5125868'), n: 20, data: BinData(0, C639B837E2C6FD1E1F2C2CF7B581106CF062C22F2F031566A47725A393588F36DA2D342159A5F601E2B449494D9C39E9F487E806BA11322E55A407121DD77A00E7F01CCA9B2C...), _id: ObjectId('574968e3ca6d39a3d512587d') } ], ordered: true, writeConcern: { w: 1 } } keyUpdates:0 numYields:0 locks(micros) w:67 reslen:40 110316ms
      2016-05-28T02:48:09.457-0700 [journal] warning assertion failure a <= 256*1024*1024 src/mongo/util/alignedbuilder.cpp 103
      2016-05-28T02:48:09.556-0700 [journal] 0x121d451 0x11bcbb9 0x119f8fe 0x119f063 0xa794bf 0xa79774 0xa694d3 0xa6bc39 0xa6bf77 0x1262219 0x7f1898b00a51 0x7f1897ea593d 
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x121d451]
       /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x11bcbb9]
       /usr/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x16e) [0x119f8fe]
       /usr/bin/mongod(_ZN5mongo14AlignedBuilder14growReallocateEj+0x63) [0x119f063]
       /usr/bin/mongod() [0xa794bf]
       /usr/bin/mongod(_ZN5mongo3dur13PREPLOGBUFFERERNS0_11JSectHeaderERNS_14AlignedBuilderE+0x214) [0xa79774]
       /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0xa3) [0xa694d3]
       /usr/bin/mongod() [0xa6bc39]
       /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x297) [0xa6bf77]
       /usr/bin/mongod() [0x1262219]
       /lib64/libpthread.so.0(+0x7a51) [0x7f1898b00a51]
       /lib64/libc.so.6(clone+0x6d) [0x7f1897ea593d]
      2016-05-28T02:48:10.206-0700 [journal] rate limiting wassert
      2016-05-28T02:48:13.558-0700 [journal] Assertion failure a <= 512*1024*1024 src/mongo/util/alignedbuilder.cpp 104
      2016-05-28T02:48:13.574-0700 [journal] 0x121d451 0x11bcbb9 0x11a1124 0x119f07d 0xa794bf 0xa79774 0xa694d3 0xa6bc39 0xa6bf77 0x1262219 0x7f1898b00a51 0x7f1897ea593d 
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x121d451]
       /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x11bcbb9]
       /usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0x194) [0x11a1124]
       /usr/bin/mongod(_ZN5mongo14AlignedBuilder14growReallocateEj+0x7d) [0x119f07d]
       /usr/bin/mongod() [0xa794bf]
       /usr/bin/mongod(_ZN5mongo3dur13PREPLOGBUFFERERNS0_11JSectHeaderERNS_14AlignedBuilderE+0x214) [0xa79774]
       /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0xa3) [0xa694d3]
       /usr/bin/mongod() [0xa6bc39]
       /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x297) [0xa6bf77]
       /usr/bin/mongod() [0x1262219]
       /lib64/libpthread.so.0(+0x7a51) [0x7f1898b00a51]
       /lib64/libc.so.6(clone+0x6d) [0x7f1897ea593d]
      2016-05-28T02:48:13.584-0700 [journal] dbexception in groupCommitLL causing immediate shutdown: 0 assertion src/mongo/util/alignedbuilder.cpp:104
      2016-05-28T02:48:13.584-0700 [journal] SEVERE: dur1
      2016-05-28T02:48:13.584-0700 [conn56192] remove loadable_store.fs.chunks query: { _id: ObjectId('574968e3ca6d39a3d512587e') } ndeleted:0 keyUpdates:0 numYields:0 locks(micros) w:69 11945ms
      2016-05-28T02:48:13.584-0700 [conn56192] command loadable_store.$cmd command: delete { delete: "fs.chunks", deletes: [ { q: { _id: ObjectId('574968e3ca6d39a3d512587e') }, limit: 0 } ], ordered: true, writeConcern: { w: 1 } } keyUpdates:0 numYields:0  reslen:40 11945ms
      2016-05-28T02:48:13.594-0700 [conn56221] update wf_file_storage.fs.chunks query: { _id: ObjectId('57496953671615d0d649e699') } update: { files_id: ObjectId('57496953671615d0d649e698'), n: 0, data: BinData(0, 1F8B0800000000000003ECB8075453DDB72F8A20A028161050A409D2114808094D308400A1871E3AA184164A802420458A2220D24191A280F42A20557A9122BD8A3401E9BDF7...), _id: ObjectId('57496953671615d0d649e699') } nscanned:0 nscannedObjects:0 nMatched:1 nModified:0 upsert:1 keyUpdates:0 numYields:0 locks(micros) w:9152 10019ms
      2016-05-28T02:48:13.594-0700 [conn56221] command wf_file_storage.$cmd command: update { update: "fs.chunks", updates: [ { q: { _id: ObjectId('57496953671615d0d649e699') }, u: { files_id: ObjectId('57496953671615d0d649e698'), n: 0, data: BinData(0, 1F8B0800000000000003ECB8075453DDB72F8A20A028161050A409D2114808094D308400A1871E3AA184164A802420458A2220D24191A280F42A20557A9122BD8A3401E9BDF7...), _id: ObjectId('57496953671615d0d649e699') }, upsert: true, multi: false } ], ordered: true, writeConcern: { w: 1 } } keyUpdates:0 numYields:0  reslen:106 10019ms
      2016-05-28T02:48:13.596-0700 [journal] SEVERE: Got signal: 6 (Aborted).
      Backtrace:0x121d451 0x121c82e 0x7f1897def6a0 0x7f1897def625 0x7f1897df0e05 0xba749b 0xa69a16 0xa6bc39 0xa6bf77 0x1262219 0x7f1898b00a51 0x7f1897ea593d 
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x121d451]
       /usr/bin/mongod() [0x121c82e]
       /lib64/libc.so.6(+0x326a0) [0x7f1897def6a0]
       /lib64/libc.so.6(gsignal+0x35) [0x7f1897def625]
       /lib64/libc.so.6(abort+0x175) [0x7f1897df0e05]
       /usr/bin/mongod(_ZN5mongo10mongoAbortEPKc+0x6b) [0xba749b]
       /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x5e6) [0xa69a16]
       /usr/bin/mongod() [0xa6bc39]
       /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x297) [0xa6bf77]
       /usr/bin/mongod() [0x1262219]
       /lib64/libpthread.so.0(+0x7a51) [0x7f1898b00a51]
       /lib64/libc.so.6(clone+0x6d) [0x7f1897ea593d]
      2016-05-28T03:04:58.207-0700 ***** SERVER RESTARTED *****
      

      Doing a little searching, i see this error occur frequently due to the journal getting backed up, where the journal has too much data that is unflushed to the datafiles. I don't see that DR102 error in my log, so I'm curious why I got this error and what I could do to prevent it.

      As far as my configuration, there is only 1 server, no replication, no sharding, that we use for storing large files using gridfs. The database size is somewhere in the 14TB range. We use node.js for the bulk of the code we write that communicates with the mongo server.

      The events leading up to this crash are pretty simple. It crashed during a time where typically nothing is happening to the server, however, we have a build process that ended up sending 16 150MB files to be stored in mongo each about 30 seconds apart. I don't typically have issues with this kind of load, but on this occasion I did.

      So what causes it? Memory allocation? Fragmentation? Is there something I can do to prevent it?

            Assignee:
            Unassigned Unassigned
            Reporter:
            kaiserfro Jeff Oliver
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: