[SERVER-6151] mongos crashed : corrupted unsorted chunks Created: 21/Jun/12  Updated: 16/Nov/21  Resolved: 01/Oct/12

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Klébert Hodin Assignee: Greg Studer
Resolution: Incomplete Votes: 1
Labels: mongos
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux x86_64 2.6.18-274.3.1.el5.centos.plus


Attachments: File mongodb.filtered.log     Text File mongos_error_extract.log    
Operating System: Linux
Participants:

 Description   

Mongos crashed.
Log extact :

      • glibc detected *** /usr/bin/mongos: free(): corrupted unsorted chunks: 0x0000000018ad2220 ***
        Received signal 11
        Backtrace: *** glibc detected *** /usr/bin/mongos: free(): corrupted unsorted chunks: 0x0000000018ad4520 ***
        Logstream::get called in uninitialized state
        Thu Jun 21 10:32:17 [conn264410] Assertion failure ! inShutdown() client/connpool.cpp 136
      • glibc detected *** /usr/bin/mongos: free(): corrupted unsorted chunks: 0x0000000018ab03f0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aaf7d0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: free(): corrupted unsorted chunks: 0x0000000018e6f540 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***
      • glibc detected *** /usr/bin/mongos: double free or corruption (fasttop): 0x0000000018aafab0 ***

As seen on mms, mongos process used a lot of virtual memory (18Gb) before failure.



 Comments   
Comment by Greg Studer [ 01/Oct/12 ]

Original submitter went away - second issue unrelated.

Comment by Greg Studer [ 28/Aug/12 ]

The previous issue was related to mongos - it seems like you're experiencing a problem with mongod (which could be related, but I'm guessing is unlikely). This problem seems more related to https://jira.mongodb.org/browse/SERVER-2652.

Comment by David Gubler [ 27/Aug/12 ]

I'm having similar issues (although I cannot tell if they're actually the same) with 2.0.7. The error occured when I tried to shut down mongodb (/etc/init.d/mongodb stop). See attached log file (mongodb.filtered.log). I have removed connect/disconnect/auth log statements.

Environment is Debian squeeze/2.6.39 from backports.

Comment by Greg Studer [ 10/Jul/12 ]

Note - the above tools would require a debug build of mongos for the trace to be usable.

Comment by Greg Studer [ 06/Jul/12 ]

Unfortunately no - we're working on recording log info but aren't there yet. The logs would have helped us potentially pull out anything that seemed abnormal.

Is it possible on your end to run memory usage tools on the mongos while running? It would be extremely useful to get memory profiling information, one option is to use tcmalloc as doc'd here:

http://gperftools.googlecode.com/svn/trunk/doc/heapprofile.html

The resulting output would target the portion of the mongos codebase that was allocating the memory. This will have a performance impact, though it may be manageable if you're running background hadoop processes.

Comment by Klébert Hodin [ 06/Jul/12 ]

No we are not. This period is 7 days long.

What kind of infos do you need ? Startup logs ?
Is there a way to get it on our MMS dashboard ?

Comment by Greg Studer [ 05/Jul/12 ]

Thanks for the additional information - discussing with mongo-hadoop maintainers now. From the previous crash, it appears you ran out of file descriptors on your system (which shouldn't have caused a crash, but indicates that you (or mongos) were using an unexpected number of connections).

Are you able to post the mongos log for a full period from startup to crash (feel free to post a SUPPORT ticket if you need to keep the log private)?

Comment by Grégoire Seux [ 04/Jul/12 ]

some precisions :

we run mongo-hadoop to dump around a 70GB collection on hdfs. The job gets around 100 map at the same time. Since it does not support connecting to multiple mongoS, one of them get all the load (100 concurrent chunks retrieved). Chunk size is fixed to 64mb.

Memory used by the mongoS increases a lot until crash.
You can see on the following mms graph : https://mms.10gen.com/chart/bookmark/4ff40250e4b02c1fc1aeaa6c that since we have started to use mongo-hadoop (~june 13th), memory has greatly increased.
The graph from this night : https://mms.10gen.com/chart/bookmark/4ff40b58e4b0f1cff8285fed also show that there is a slow decrease at some point which lead to unresponsive mongoS instance.

Comment by Klébert Hodin [ 04/Jul/12 ]

Any updates on this issue ?

We found out this crash occurs when running mongo-hadoop adapter.
https://github.com/mongodb/mongo-hadoop

Comment by Klébert Hodin [ 21/Jun/12 ]

We're using 2.5 version.

Comment by Scott Hernandez (Inactive) [ 21/Jun/12 ]

What glibc version are you using?

Generated at Thu Feb 08 03:10:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.