[SERVER-1825] Server crashes while executing group with several incoming connections Created: 21/Sep/10  Updated: 17/Mar/11  Resolved: 16/Jan/11

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.6.2
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: John Demme Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10.04, Intel Xeon 5550, Dual socket, quad core machine (16 logical cores).. 24GB RAM. (At what point does this become bragging?)


Operating System: Linux
Participants:

 Description   

jdd@sim25:/local/jdd/mdb$ mongod --dbpath=.
Tue Sep 21 19:13:41 MongoDB starting : pid=31713 port=27017 dbpath=. 64-bit
Tue Sep 21 19:13:41 db version v1.6.2, pdfile version 4.5
Tue Sep 21 19:13:41 git version: nogitversion
Tue Sep 21 19:13:41 sys info: Linux sim25 2.6.32-limit #24 SMP Mon Aug 23 17:04:56 EDT 2010 x86_64 BOOST_LIB_VERSION=1_40
Tue Sep 21 19:13:41 [initandlisten] waiting for connections on port 27017
Tue Sep 21 19:13:41 [websvr] web admin interface listening on port 28017
Tue Sep 21 19:13:44 [initandlisten] connection accepted from 172.18.2.26:52406 #1
Tue Sep 21 19:13:45 [conn1] timeoutMs not support for v8 yet
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.5:33710 #2
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.9:35546 #3
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.14:36965 #4
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.11:51524 #5
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.15:41822 #6
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.2.26:34430 #7
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.16:38408 #8
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.8:40154 #9
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.7:46304 #10
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.3:60494 #11
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.10:38066 #12
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.6:53665 #13
Tue Sep 21 19:17:46 [conn7] query cgo11.functions reslen:203 nscanned:21178

{ program: "403.gcc", symbol: "set_mem_alias_set" }

nreturned:1 105ms
Tue Sep 21 19:17:46 [conn7] query cgo11.profile_times reslen:5516 nscanned:16

{ program: "gcc", symbol: "set_mem_alias_set" }

nreturned:16 133ms
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.12:39345 #14
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.1:40319 #15
Tue Sep 21 19:17:46 [initandlisten] connection accepted from 172.18.3.2:54049 #16
Tue Sep 21 19:17:47 [conn7] query cgo11.profile_times reslen:5420 nscanned:16

{ program: "gcc", symbol: "timevar_pop" }

nreturned:16 127ms
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.3.13:60805 #17
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.26:34432 #18
Tue Sep 21 19:17:47 [conn7] query cgo11.profile_times reslen:5580 nscanned:16

{ program: "gcc", symbol: "skip_escaped_newlines" }

nreturned:16 103ms
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.19:36263 #19
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.24:58759 #20
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.23:54979 #21
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.20:42233 #22
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.18:41035 #23
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.21:50425 #24
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.22:33530 #25
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.17:36754 #26
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.16:47877 #27
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.15:50879 #28
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.2.14:33919 #29
Tue Sep 21 19:17:47 [conn7] query cgo11.profile_times reslen:5516 nscanned:16

{ program: "gcc", symbol: "global_bindings_p" }

nreturned:16 127ms
Tue Sep 21 19:17:47 [initandlisten] connection accepted from 172.18.3.4:48799 #30
Tue Sep 21 19:17:47 [conn7] query cgo11.profile_times reslen:5484 nscanned:16

{ program: "gcc", symbol: "c_size_in_bytes" }

nreturned:16 116ms
Tue Sep 21 19:17:48 [conn7] query cgo11.profile_times reslen:5676 nscanned:16

{ program: "namd", symbol: "_Z11parse_paramPKcS0_S0_Ri" }

nreturned:16 137ms
Tue Sep 21 19:17:48 [conn7] query cgo11.profile_times reslen:5415 nscanned:16

{ program: "wrf", symbol: "multi_files_" }

nreturned:16 132ms
Tue Sep 21 19:17:49 [conn7] query cgo11.profile_times reslen:5772 nscanned:16

{ program: "soplex", symbol: "_ZNK6soplex9SLUFactor6memoryEv" }

nreturned:16 129ms
Tue Sep 21 19:17:49 [conn7] query cgo11.profile_times reslen:5644 nscanned:16

{ program: "cactusADM", symbol: "PUGH_ArrayGroupSize" }

nreturned:16 112ms
Tue Sep 21 19:17:49 [conn7] query cgo11.profile_times reslen:5884 nscanned:16

{ program: "cactusADM", symbol: "CCTKi_ScheduleTraverseGHExtensions" }

nreturned:16 103ms
Tue Sep 21 19:17:50 [conn7] query cgo11.profile_times reslen:5420 nscanned:16

{ program: "calculix", symbol: "DVfill" }

nreturned:16 110ms
Tue Sep 21 19:17:50 [conn7] query cgo11.profile_times reslen:5788 nscanned:16

{ program: "cactusADM", symbol: "CCTKi_TimerGetTimeOfDayStart" }

nreturned:16 120ms
Tue Sep 21 19:17:52 [conn7] query cgo11.profile_times reslen:5484 nscanned:16

{ program: "h264ref", symbol: "start_slice" }

nreturned:16 102ms

#

  1. Fatal error in CALL_AND_RETRY_2
  2. Allocation failed - process out of memory
    #

Tue Sep 21 19:17:53 Got signal: 6 (Aborted).

Tue Sep 21 19:17:53 Backtrace:
0x7624e9 0x7f41274bfaf0 0x7f41274bfa75 0x7f41274c35c0 0x8065d9 0x78b56d 0x77369d 0x7a0df6 0x77de4e 0x77e188 0x5b0b65 0x5a0cd7 0x6fe225 0x6ff892 0x6f2d14 0x6f4060 0x5b3f5b 0x5b7489 0x682df0 0x7642af
mongod(_ZN5mongo10abruptQuitEi+0x3d9) [0x7624e9]
/lib/libc.so.6(+0x33af0) [0x7f41274bfaf0]
/lib/libc.so.6(gsignal+0x35) [0x7f41274bfa75]
/lib/libc.so.6(abort+0x180) [0x7f41274c35c0]
mongod() [0x8065d9]
mongod() [0x78b56d]
mongod() [0x77369d]
mongod() [0x7a0df6]
mongod() [0x77de4e]
mongod() [0x77e188]
mongod(_ZN5mongo9mongoToV8ERKNS_7BSONObjEbb+0x1b5) [0x5b0b65]
mongod(_ZN5mongo7V8Scope9setObjectEPKcRKNS_7BSONObjEb+0x67) [0x5a0cd7]
mongod(_ZN5mongo12GroupCommand5groupESsRKSsRKNS_7BSONObjES3_SsSsPKcS3_SsRSsRNS_14BSONObjBuilderE+0x5e5) [0x6fe225]
mongod(_ZN5mongo12GroupCommand3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0xc62) [0x6ff892]
mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x7d4) [0x6f2d14]
mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x620) [0x6f4060]
mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x3b) [0x5b3f5b]
mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x1099) [0x5b7489]
mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x10c0) [0x682df0]
mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x2ff) [0x7642af]

Tue Sep 21 19:17:53 dbexit:

Tue Sep 21 19:17:53 [conn1] shutdown: going to close listening sockets...
Tue Sep 21 19:17:53 [conn1] going to close listening socket: 5
Tue Sep 21 19:17:53 [conn1] going to close listening socket: 6
Tue Sep 21 19:17:53 [conn1] going to close listening socket: 7
Tue Sep 21 19:17:53 [conn1] going to close listening socket: 8
Tue Sep 21 19:17:53 [conn1] shutdown: going to flush oplog...
Tue Sep 21 19:17:53 [conn1] shutdown: going to close sockets...
Tue Sep 21 19:17:53 [conn1] shutdown: waiting for fs preallocator...
Tue Sep 21 19:17:53 [conn1] shutdown: closing all files...
Tue Sep 21 19:17:53 closeAllFiles() finished

Tue Sep 21 19:17:53 [conn1] shutdown: removing fs lock...
Tue Sep 21 19:17:53 dbexit: really exiting now

ERROR: Client::~Client _context should be NULL: conn

This occurs when I am running the following from a shell and bunch of processes attempt to connect. Running Mongo 1.6.2.
db.scores.group({key:

{type: 1}

, reduce: function(obj, prev)

{ prev.csum += obj.score; prev.count += 1; }

, initial: { csum: 0.0, count: 0 }})

> db.scores.stats()
{
"ns" : "cgo11.scores",
"count" : 2682533,
"size" : 1233354604,
"avgObjSize" : 459.77238826139325,
"storageSize" : 1463372544,
"numExtents" : 24,
"nindexes" : 2,
"lastExtentSize" : 251234560,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 580608960,
"indexSizes" :

{ "_id_" : 161161216, "a_1_b_1" : 419447744 }

,
"ok" : 1
}

It's a rather long group operation.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 16/Jan/11 ]

There is a case to finish v8 support, which is the root problem here.

Comment by John Demme [ 22/Sep/10 ]

Yes, this is V8. I didn't realize it wasn't fully supported yet.

Comment by Eliot Horowitz (Inactive) [ 22/Sep/10 ]

That's with V8?
v8 isn't really supported yet - has some issues (including a memory leak which I think is what you found)

Comment by John Demme [ 21/Sep/10 ]

Occurred again without anybody else connecting.

Tue Sep 21 19:22:33 MongoDB starting : pid=31777 port=27017 dbpath=. 64-bit
Tue Sep 21 19:22:33 db version v1.6.2, pdfile version 4.5
Tue Sep 21 19:22:33 git version: nogitversion
Tue Sep 21 19:22:33 sys info: Linux sim25 2.6.32-limit #24 SMP Mon Aug 23 17:04:56 EDT 2010 x86_64 BOOST_LIB_VERSION=1_40
Tue Sep 21 19:22:34 [initandlisten] waiting for connections on port 27017
Tue Sep 21 19:22:34 [websvr] web admin interface listening on port 28017
Tue Sep 21 19:22:51 [initandlisten] connection accepted from 172.18.2.26:53687 #1
Tue Sep 21 19:23:35 [conn1] timeoutMs not support for v8 yet

#

  1. Fatal error in CALL_AND_RETRY_2
  2. Allocation failed - process out of memory
    #

Tue Sep 21 19:27:42 Got signal: 6 (Aborted).

Tue Sep 21 19:27:42 Backtrace:
0x7624e9 0x7f416931baf0 0x7f416931ba75 0x7f416931f5c0 0x8065d9 0x78b56d 0x77369d 0x7a0df6 0x77de4e 0x77e188 0x5b0b65 0x5a0cd7 0x6fe225 0x6ff892 0x6f2d14 0x6f4060 0x5b3f5b 0x5b7489 0x682df0 0x7642af
mongod(_ZN5mongo10abruptQuitEi+0x3d9) [0x7624e9]
/lib/libc.so.6(+0x33af0) [0x7f416931baf0]
/lib/libc.so.6(gsignal+0x35) [0x7f416931ba75]
/lib/libc.so.6(abort+0x180) [0x7f416931f5c0]
mongod() [0x8065d9]
mongod() [0x78b56d]
mongod() [0x77369d]
mongod() [0x7a0df6]
mongod() [0x77de4e]
mongod() [0x77e188]
mongod(_ZN5mongo9mongoToV8ERKNS_7BSONObjEbb+0x1b5) [0x5b0b65]
mongod(_ZN5mongo7V8Scope9setObjectEPKcRKNS_7BSONObjEb+0x67) [0x5a0cd7]
mongod(_ZN5mongo12GroupCommand5groupESsRKSsRKNS_7BSONObjES3_SsSsPKcS3_SsRSsRNS_14BSONObjBuilderE+0x5e5) [0x6fe225]
mongod(_ZN5mongo12GroupCommand3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0xc62) [0x6ff892]
mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x7d4) [0x6f2d14]
mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x620) [0x6f4060]
mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x3b) [0x5b3f5b]
mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x1099) [0x5b7489]
mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x10c0) [0x682df0]
mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x2ff) [0x7642af]

Tue Sep 21 19:27:42 dbexit:

Tue Sep 21 19:27:42 [conn1] shutdown: going to close listening sockets...
Tue Sep 21 19:27:42 [conn1] going to close listening socket: 5
Tue Sep 21 19:27:42 [conn1] going to close listening socket: 6
Tue Sep 21 19:27:42 [conn1] going to close listening socket: 7
Tue Sep 21 19:27:42 [conn1] going to close listening socket: 8
Tue Sep 21 19:27:42 [conn1] shutdown: going to flush oplog...
Tue Sep 21 19:27:42 [conn1] shutdown: going to close sockets...
Tue Sep 21 19:27:42 [conn1] shutdown: waiting for fs preallocator...
Tue Sep 21 19:27:42 [conn1] shutdown: closing all files...
Tue Sep 21 19:27:42 closeAllFiles() finished

Tue Sep 21 19:27:42 [conn1] shutdown: removing fs lock...
Tue Sep 21 19:27:42 dbexit: really exiting now

ERROR: Client::~Client _context should be NULL: conn

Generated at Thu Feb 08 02:58:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.