[SERVER-8005] mongod crashed when ending connection Created: 21/Dec/12  Updated: 10/Dec/14  Resolved: 09/Apr/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: stronglee Assignee: Randolph Tan
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

os: Debian 6.0 kernel: Linux 2.6.38-2-amd64
cluster: a sharded cluster with 4 replica sets. Each rs contains a primary & a second & a arbiter (mongod v2.2.1)
A) 90 mongos v2.2.2 connected to the cluster with 5 long connections each.
B) 1 mongos v2.2.2 connected to the cluster with hundreds of short connections concurrently.(my app connected to mongos by pymongo2.4)


Operating System: Linux
Participants:

 Description   

A) 90 mongos v2.2.2 connected to the cluster with 5 long connections each.
B) 1 mongos v2.2.2 connected to the cluster with hundreds of short connections concurrently.

Two different mongod crashed with the same symptom in a week, the mongod crashed
when closing many connections from B.

log1:
Thu Dec 20 16:48:14 [conn8446] end connection 123.58.167.178:51416 (345 connections now open)
Thu Dec 20 16:48:14 [conn8444] end connection 123.58.167.178:51395 (344 connections now open)
Thu Dec 20 16:48:14 [conn8450] end connection 123.58.167.178:51536 (343 connections now open)
Thu Dec 20 16:48:14 [conn8445] end connection 123.58.167.178:51403 (342 connections now open)
Thu Dec 20 16:48:14 [conn8435] end connection 123.58.167.178:51189 (341 connections now open)
Thu Dec 20 16:48:14 [conn8442] end connection 123.58.167.178:51351 (340 connections now open)
Thu Dec 20 16:48:14 [conn8447] end connection 123.58.167.178:51473 (340 connections now open)
Thu Dec 20 16:48:14 [conn8448] end connection 123.58.167.178:51498 (338 connections now open)
Thu Dec 20 16:48:14 [conn8451] end connection 123.58.167.178:51566 (337 connections now open)
Thu Dec 20 16:48:14 [conn8453] end connection 123.58.167.178:51644 (336 connections now open)
Thu Dec 20 16:48:15 [snapshotthread] cpu: elapsed:4000 writelock: 0%
Thu Dec 20 16:48:16 [conn8369] end connection 123.58.167.178:59503 (335 connections now open)
Thu Dec 20 16:48:16 [conn8412] end connection 123.58.167.178:50814 (334 connections now open)
Thu Dec 20 16:48:16 [conn8375] end connection 123.58.167.178:59531 (333 connections now open)
Thu Dec 20 16:48:16 [conn8376] end connection 123.58.167.178:59541 (332 connections now open)
Thu Dec 20 16:48:16 [conn8413] end connection 123.58.167.178:50831 (331 connections now open)
Thu Dec 20 16:48:16 [conn8378] end connection 123.58.167.178:59553 (330 connections now open)
Thu Dec 20 16:48:16 [conn8374] end connection 123.58.167.178:59530 (329 connections now open)
Thu Dec 20 16:48:16 [conn8379] end connection 123.58.167.178:59565 (328 connections now open)
Thu Dec 20 16:48:16 [conn8362] end connection 123.58.167.178:59192 (327 connections now open)
Thu Dec 20 16:48:16 [conn8363] end connection 123.58.167.178:59203 (326 connections now open)
Thu Dec 20 16:48:16 [conn8431] end connection 123.58.167.178:51125 (325 connections now open)
Thu Dec 20 16:48:16 [conn8433] end connection 123.58.167.178:51171 (324 connections now open)
Thu Dec 20 16:48:16 [conn8380] end connection 123.58.167.178:59569 (323 connections now open)
Thu Dec 20 16:48:16 [conn8416] end connection 123.58.167.178:50859 (322 connections now open)
Thu Dec 20 16:48:16 [conn8383] end connection 123.58.167.178:59577 (321 connections now open)
Thu Dec 20 16:48:16 [conn8439] end connection 123.58.167.178:51278 (320 connections now open)
Thu Dec 20 16:48:16 [conn8371] end connection 123.58.167.178:59506 (319 connections now open)
Thu Dec 20 16:48:16 [conn8430] end connection 123.58.167.178:51100 (318 connections now open)
Thu Dec 20 16:48:16 [conn8452] end connection 123.58.167.178:51637 (317 connections now open)
Thu Dec 20 16:48:16 [conn8441] end connection 123.58.167.178:51325 (316 connections now open)
Thu Dec 20 16:48:16 [conn8364] end connection 123.58.167.178:59205 (315 connections now open)
Thu Dec 20 16:48:16 [conn8373] end connection 123.58.167.178:59529 (314 connections now open)
Thu Dec 20 16:48:16 [conn8382] end connection 123.58.167.178:59575 (313 connections now open)
Thu Dec 20 16:48:16 [conn8424] end connection 123.58.167.178:50968 (312 connections now open)
Thu Dec 20 16:48:16 [conn8366] end connection 123.58.167.178:59217 (311 connections now open)
Thu Dec 20 16:48:16 [conn8428] end connection 123.58.167.178:51076 (310 connections now open)
Thu Dec 20 16:48:16 [conn8443] end connection 123.58.167.178:51364 (309 connections now open)
Thu Dec 20 16:48:16 [conn8370] end connection 123.58.167.178:59504 (308 connections now open)
Thu Dec 20 16:48:16 [conn8360] end connection 123.58.167.178:59172 (307 connections now open)
Thu Dec 20 16:48:16 [conn8377] end connection 123.58.167.178:59551 (306 connections now open)
Thu Dec 20 16:48:16 [conn8414] end connection 123.58.167.178:50833 (305 connections now open)
Thu Dec 20 16:48:16 [conn8365] end connection 123.58.167.178:59207 (304 connections now open)
Thu Dec 20 16:48:16 Invalid access at address: 0 from thread: conn8365

Thu Dec 20 16:48:16 Got signal: 11 (Segmentation fault).

Thu Dec 20 16:48:16 [conn8438] end connection 123.58.167.178:51260 (303 connections now open)
Thu Dec 20 16:48:16 [conn8415] end connection 123.58.167.178:50837 (302 connections now open)
Thu Dec 20 16:48:16 [conn8352] end connection 123.58.167.178:58905 (301 connections now open)
Thu Dec 20 16:48:16 [conn8417] end connection 123.58.167.178:50861 (300 connections now open)
Thu Dec 20 16:48:16 Backtrace:
0xaf8c41 0x5586c9 0x558c52 0x7fa0fd6d3ff0 0xb56fa8 0xb57265 0xb5727e 0x7fa0fd6caee9 0x7fa0fd6cb8d8 0x7fa0fca7eb6d
/usr/local/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaf8c41]
/usr/local/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5586c9]
/usr/local/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x558c52]
/lib/libpthread.so.0(+0xeff0) [0x7fa0fd6d3ff0]
/usr/local/mongodb/bin/mongod(_ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi+0x2d8) [0xb56fa8]
/usr/local/mongodb/bin/mongod(_ZN8tcmalloc11ThreadCache7CleanupEv+0x45) [0xb57265]
/usr/local/mongodb/bin/mongod(ZN8tcmalloc11ThreadCache11DeleteCacheEPS0+0xe) [0xb5727e]
/lib/libpthread.so.0(+0x5ee9) [0x7fa0fd6caee9]
/lib/libpthread.so.0(+0x68d8) [0x7fa0fd6cb8d8]
/lib/libc.so.6(clone+0x6d) [0x7fa0fca7eb6d]

log2:
Fri Dec 14 20:23:02 [conn26573] end connection 123.58.167.178:33444 (313 connections now open)
Fri Dec 14 20:23:02 [conn26567] end connection 123.58.167.178:55047 (312 connections now open)
Fri Dec 14 20:23:02 [conn26570] end connection 123.58.167.178:55056 (311 connections now open)
Fri Dec 14 20:23:02 [conn26558] end connection 123.58.167.178:55018 (309 connections now open)
Fri Dec 14 20:23:02 [conn26559] end connection 123.58.167.178:55027 (307 connections now open)
Fri Dec 14 20:23:02 [conn26553] end connection 123.58.167.178:54997 (306 connections now open)
Fri Dec 14 20:23:02 [conn26549] end connection 123.58.167.178:54988 (305 connections now open)
Fri Dec 14 20:23:02 [conn26548] end connection 123.58.167.178:54979 (304 connections now open)
Fri Dec 14 20:23:02 [conn26557] end connection 123.58.167.178:55013 (307 connections now open)
Fri Dec 14 20:23:02 [conn26601] end connection 123.58.167.178:33547 (304 connections now open)
Fri Dec 14 20:23:02 [conn26538] end connection 123.58.167.178:54943 (304 connections now open)
Fri Dec 14 20:23:02 [conn26541] end connection 123.58.167.178:54956 (303 connections now open)
Fri Dec 14 20:23:02 [conn26540] end connection 123.58.167.178:54952 (302 connections now open)
Fri Dec 14 20:23:02 [conn26560] end connection 123.58.167.178:55036 (304 connections now open)
Fri Dec 14 20:23:02 [conn26537] end connection 123.58.167.178:54941 (298 connections now open)
Fri Dec 14 20:23:02 [conn26536] end connection 123.58.167.178:54935 (298 connections now open)
Fri Dec 14 20:23:02 [conn26532] end connection 123.58.167.178:54885 (297 connections now open)
Fri Dec 14 20:23:02 [conn26534] end connection 123.58.167.178:54920 (296 connections now open)
Fri Dec 14 20:23:02 [conn26543] end connection 123.58.167.178:54970 (295 connections now open)
Fri Dec 14 20:23:02 [conn26533] end connection 123.58.167.178:54912 (297 connections now open)
Fri Dec 14 20:23:02 [conn26619] end connection 123.58.167.178:33835 (293 connections now open)
Fri Dec 14 20:23:02 [conn26527] end connection 123.58.167.178:54839 (292 connections now open)
Fri Dec 14 20:23:02 [conn26531] end connection 123.58.167.178:54881 (291 connections now open)
Fri Dec 14 20:23:02 [conn26530] end connection 123.58.167.178:54874 (293 connections now open)
Fri Dec 14 20:23:02 [conn26526] end connection 123.58.167.178:54818 (288 connections now open)
Fri Dec 14 20:23:02 [conn26528] end connection 123.58.167.178:54842 (287 connections now open)
src/third_party/gperftools-2.0/src/central_freelist.cc:106] span->refcount > 0
Fri Dec 14 20:23:02 [conn26535] end connection 123.58.167.178:54933 (297 connections now open)
Fri Dec 14 20:23:02 [conn26259] end connection 123.58.167.178:58989 (286 connections now open)
Fri Dec 14 20:23:02 [conn26525] end connection 123.58.167.178:54806 (283 connections now open)
Fri Dec 14 20:23:02 [conn26529] end connection 123.58.167.178:54849 (283 connections now open)
Fri Dec 14 20:23:02 Got signal: 6 (Aborted).

Fri Dec 14 20:23:02 Backtrace:
0xaf8c41 0x5586c9 0x7faff0b48230 0x7faff0b481b5 0x7faff0b4afc0 0xb45dbb 0xb4414c 0xb443bb 0xb449cb 0xb56f6a 0xb57265 0xb5727e 0x7faff1831ee9 0x7faff18328d8 0x7faff0be5b6d
/usr/local/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaf8c41]
/usr/local/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5586c9]
/lib/libc.so.6(+0x32230) [0x7faff0b48230]
/lib/libc.so.6(gsignal+0x35) [0x7faff0b481b5]
/lib/libc.so.6(abort+0x180) [0x7faff0b4afc0]
/usr/local/mongodb/bin/mongod(ZN8tcmalloc3LogENS_7LogModeEPKciNS_7LogItemES3_S3_S3+0x20b) [0xb45dbb]
/usr/local/mongodb/bin/mongod(_ZN8tcmalloc15CentralFreeList14ReleaseToSpansEPv+0x12c) [0xb4414c]
/usr/local/mongodb/bin/mongod(_ZN8tcmalloc15CentralFreeList18ReleaseListToSpansEPv+0x1b) [0xb443bb]
/usr/local/mongodb/bin/mongod(_ZN8tcmalloc15CentralFreeList11InsertRangeEPvS1_i+0x6b) [0xb449cb]
/usr/local/mongodb/bin/mongod(_ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi+0x29a) [0xb56f6a]
/usr/local/mongodb/bin/mongod(_ZN8tcmalloc11ThreadCache7CleanupEv+0x45) [0xb57265]
/usr/local/mongodb/bin/mongod(ZN8tcmalloc11ThreadCache11DeleteCacheEPS0+0xe) [0xb5727e]
/lib/libpthread.so.0(+0x5ee9) [0x7faff1831ee9]
/lib/libpthread.so.0(+0x68d8) [0x7faff18328d8]
/lib/libc.so.6(clone+0x6d) [0x7faff0be5b6d]



 Comments   
Comment by Thomas Rueckstiess [ 09/Apr/14 ]

Hi,

Apologies for not getting back to you, it looks like this ticket fell through the cracks.

If you are still using v2.2.2 and this is still an issue for you, we recommend that you upgrade to the latest version of the 2.2 cycle, which is currently 2.2.7, or a newer version of MongoDB, like 2.4.10. There have been many bug fixes and improvements since then.

Please let us know if this is still an issue after upgrading.

Regards,
Thomas

Comment by stronglee [ 17/Feb/13 ]

Hi,

We have about 90 databases on a sharding cluster with 4 replica sets, 90 game servers keep writing logs into each own database.
There is a web application used by game masters. When they need to query all game servers' logs, the application connects to all the dbs via mongos concurrently, queriy, and close. This kind of queries are taken very frequently. So each mongod accept/query/close a lot of connections concurrenty.

When this bug occured, we were using mongos(v2.2.2) to connect to mongod(v2.2.1). Is it possible that this bug is caused by version not compatible?
After the crash, we upgraded all mongodb components to v2.2.2 and used connection pool in pymongo to avoid mass short connections.
And we didin't meet this kind of crash again after these two changes.

Startup parameters:
/usr/local/mongodb/bin/mongod --config /home/xx/conf/mongodb/mongodb.conf --pidfilepath /var/run/mongodb/mongodb.pid

mongodb.conf:
dbpath=/home/xx/mongodb/db
logpath=/home/xx/log/mongodb/mongodb.log
logappend=true
port = 27017
cpu = true
auth = true
keyFile = /home/xx/conf/mongodb/keyfile
verbose = false
nohttpinterface = true
replSet = mds1

Comment by Randolph Tan [ 13/Feb/13 ]

Hi,

We're still investigating on the root cause of the problem, would you be able to give us more details on what kind of operations are being performed on this databases and what is the complete start up parameters passed to the mongod?

Thanks!

Comment by stronglee [ 05/Jan/13 ]

Hi, happy new year!

1. No, we are not running mongod with ssl.
2. Yes, we installed mongodb by apt-get command.

We used mongos(v2.2.2) to connect to mongod(v2.2.1) when it cashed.

Comment by Randolph Tan [ 04/Jan/13 ]

Hi,

I have a couple of questions:

1. Are you running the mongod instances with ssl (http://docs.mongodb.org/manual/administration/ssl/)?
2. Are you using the official binaries from the mongodb.org downloads page? If yes, can you tell me the exact version you are using for the crashed mongod?

Generated at Thu Feb 08 03:16:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.