[SERVER-2485] "Uncaught std::exception: St9bad_alloc, terminating" caused clean crash Created: 05/Feb/11  Updated: 12/Jul/16  Resolved: 02/Sep/11

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.7.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Mytton Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File physical-mem.png     PNG File resident-memory.png     PNG File swap-memory-zoomed.png     PNG File swap-memory.png    
Issue Links:
Related
Operating System: Linux
Participants:

 Description   

mongod (primary) crashed with "Uncaught std::exception: St9bad_alloc, terminating". Looked like a clean shutdown although the lock file was not removed. Upon restarting the mongod it complained of unclean shutdown.

Sat Feb 5 07:34:58 [conn54697] remove sd.metrics query:

{ _id: ObjectId('4d4455b4150ba0457700056b'), accId: 2367, sId: 1 }

281ms
Sat Feb 5 07:34:58 [conn54697] remove sd.metrics query:

{ _id: ObjectId('4d4455b4150ba08475000aad'), accId: 2424, sId: 1 }

228ms
Sat Feb 5 07:34:58 [conn54697] remove sd.metrics query:

{ _id: ObjectId('4d4455b5150ba08b78000315'), accId: 3096, sId: 2 }

119ms
Sat Feb 5 07:34:59 [conn54697] remove sd.metrics query:

{ _id: ObjectId('4d4455b5160ba087240004dc'), accId: 3199, sId: 8 }

137ms
Sat Feb 5 07:35:07 [conn54697] query sd.metrics ntoreturn:1000 reslen:153521 nscanned:3100738 { $query: { $nor: [ { accId: 2461, tA:

{ $gt: new Date(1294299300226) }

}, { accId: 2805, tA:

{ $gt: new Date(1296286500226) }

}, { tA:

{ $gt: new Date(1296632100228) }

} ] }, $orderby:

{ _id: 1 }

} nreturned:1000 6786ms
Sat Feb 5 07:35:08 [conn54697] remove sd.metrics query:

{ _id: ObjectId('4d4455b8140ba02e680015ed'), accId: 3282, sId: 2 }

150ms
Sat Feb 5 07:35:09 [conn54697] remove sd.metrics query:

{ _id: ObjectId('4d4455b8160ba051240005a6'), accId: 3441, sId: 4 }

155ms
Sat Feb 5 07:35:18 [conn38727] query sd.$cmd ntoreturn:1 command: { count: "sessions", query: {} } reslen:64 155ms
Sat Feb 5 07:35:18 [conn27442] query sd.$cmd ntoreturn:1 command: { count: "sessions", query: {} } reslen:64 191ms
Sat Feb 5 07:35:18 [conn54337] query sd.metrics ntoreturn:864 reslen:39938 nscanned:368 { $query: { accId: 3564, metric: "diskUsage", sId: 8, tA:

{ $gte: new Date(1296802215000), $lte: new Date(1296946799000) }

, tAG:

{ $in: [ 5, 60 ] }

}, $orderby:

{ tA: 1 }

} nreturned:281 2009ms
Sat Feb 5 07:35:20 [conn54697] query sd.metrics ntoreturn:1000 reslen:151508 nscanned:3100678 { $query: { $nor: [ { accId: 2461, tA:

{ $gt: new Date(1294299310928) }

}, { accId: 2805, tA:

{ $gt: new Date(1296286510929) }

}, { tA:

{ $gt: new Date(1296632110930) }

} ] }, $orderby:

{ _id: 1 }

} nreturned:1000 9124ms
Sat Feb 5 07:35:20 [conn54541] insert sd.metrics 113ms
Sat Feb 5 07:35:21 [conn54697] remove sd.metrics query:

{ _id: ObjectId('4d4455b9150ba04f77000589'), accId: 3963, sId: 2 }

353ms
Sat Feb 5 07:35:22 [conn5864] Uncaught std::exception: St9bad_alloc, terminating
Sat Feb 5 07:35:22 dbexit:
Sat Feb 5 07:35:22 [conn54749] got request after shutdown()
Sat Feb 5 07:35:22 [conn54495] got request after shutdown()
Sat Feb 5 07:35:22 [conn54768] got request after shutdown()
Sat Feb 5 07:35:22 [conn54697] got request after shutdown()
Sat Feb 5 07:35:22 [conn54313] got request after shutdown()
Sat Feb 5 07:35:22 [conn54536] got request after shutdown()
Sat Feb 5 07:35:22 [conn54478] got request after shutdown()
Sat Feb 5 07:35:22 [conn54488] got request after shutdown()
Sat Feb 5 07:35:22 [conn54489] got request after shutdown()
Sat Feb 5 07:35:22 [conn89] got request after shutdown()
Sat Feb 5 07:35:22 [conn54298] got request after shutdown()
Sat Feb 5 07:35:22 [conn54268] got request after shutdown()
Sat Feb 5 07:35:22 [conn54521] got request after shutdown()
Sat Feb 5 07:35:22 [conn5864] shutdown: going to close listening sockets...
Sat Feb 5 07:35:22 [conn5864] closing listening socket: 5
Sat Feb 5 07:35:22 [conn5864] closing listening socket: 6
Sat Feb 5 07:35:22 [conn5864] closing listening socket: 7
Sat Feb 5 07:35:22 [conn5864] closing listening socket: 8
Sat Feb 5 07:35:22 [conn5864] removing socket file: /tmp/mongodb-27018.sock
Sat Feb 5 07:35:22 [conn54319] got request after shutdown()
Sat Feb 5 07:35:22 [conn50493] got request after shutdown()
Sat Feb 5 07:35:22 [conn54338] got request after shutdown()
Sat Feb 5 07:35:22 [conn54501] got request after shutdown()
Sat Feb 5 07:35:22 [conn54765] got request after shutdown()
Sat Feb 5 07:35:22 [conn51892] got request after shutdown()
Sat Feb 5 07:35:22 [conn52318] got request after shutdown()
Sat Feb 5 07:35:22 [conn54490] got request after shutdown()
Sat Feb 5 07:35:22 [conn52304] got request after shutdown()
Sat Feb 5 07:35:22 [conn54763] got request after shutdown()
Sat Feb 5 07:35:22 [conn54753] got request after shutdown()
Sat Feb 5 07:35:22 [conn5864] removing socket file: /tmp/mongodb-28018.sock
Sat Feb 5 07:35:22 [conn5864] shutdown: going to flush diaglog...
Sat Feb 5 07:35:22 [conn5864] shutdown: going to close sockets...
Sat Feb 5 07:35:22 [conn54296] got request after shutdown()
Sat Feb 5 07:35:22 [conn52040] got request after shutdown()
Sat Feb 5 07:35:22 [conn54781] got request after shutdown()
Sat Feb 5 07:35:22 [conn53689] got request after shutdown()
Sat Feb 5 07:35:22 [conn54532] got request after shutdown()
Sat Feb 5 07:35:22 [conn53832] got request after shutdown()
Sat Feb 5 07:35:22 [conn54255] got request after shutdown()
Sat Feb 5 07:35:22 [conn54502] got request after shutdown()
Sat Feb 5 07:35:22 [conn54238] got request after shutdown()
Sat Feb 5 07:35:22 [conn54775] got request after shutdown()
Sat Feb 5 07:35:22 [conn54538] got request after shutdown()
Sat Feb 5 07:35:22 [conn54747] got request after shutdown()
Sat Feb 5 07:35:22 [conn2] got request after shutdown()
Sat Feb 5 07:35:22 [initandlisten] now exiting
Sat Feb 5 07:35:22 dbexit: ; exiting immediately
Sat Feb 5 07:35:22 [conn54778] got request after shutdown()
Sat Feb 5 07:35:22 [conn54277] got request after shutdown()
Sat Feb 5 07:35:22 [conn54540] got request after shutdown()
Sat Feb 5 07:35:22 [conn54510] got request after shutdown()
Sat Feb 5 07:35:22 [conn54317] got request after shutdown()
Sat Feb 5 07:35:22 [conn54242] got request after shutdown()
Sat Feb 5 07:35:22 [conn54230] got request after shutdown()
Sat Feb 5 07:35:22 [conn54303] got request after shutdown()
Sat Feb 5 07:35:22 [conn54771] got request after shutdown()
Sat Feb 5 07:35:22 [conn54507] got request after shutdown()
Sat Feb 5 07:35:22 [conn54757] got request after shutdown()
Sat Feb 5 07:35:22 [conn53017] got request after shutdown()
Sat Feb 5 07:35:22 [conn53741] got request after shutdown()
Sat Feb 5 07:35:22 [conn54759] got request after shutdown()
Sat Feb 5 07:35:22 [conn54267] got request after shutdown()
Sat Feb 5 07:35:22 [conn54539] got request after shutdown()
Sat Feb 5 07:35:22 [conn53209] got request after shutdown()
Sat Feb 5 07:35:22 [conn54297] got request after shutdown()
Sat Feb 5 07:35:22 [conn53896] got request after shutdown()
Sat Feb 5 07:35:22 [conn52641] got request after shutdown()
Sat Feb 5 07:35:22 [conn54492] got request after shutdown()
Sat Feb 5 07:35:22 [conn54783] got request after shutdown()
Sat Feb 5 07:35:22 [conn54527] got request after shutdown()
Sat Feb 5 07:35:22 [conn53834] got request after shutdown()
Sat Feb 5 07:35:22 [conn54746] got request after shutdown()
Sat Feb 5 07:35:22 [conn52130] got request after shutdown()
Sat Feb 5 07:35:22 [conn54548] got request after shutdown()
Sat Feb 5 07:35:22 [conn54236] got request after shutdown()
Sat Feb 5 07:35:22 [conn54274] got request after shutdown()
Sat Feb 5 07:35:22 [conn53749] got request after shutdown()
Sat Feb 5 07:35:22 [conn54292] got request after shutdown()
Sat Feb 5 07:35:22 [conn54557] got request after shutdown()
Sat Feb 5 07:35:22 [conn51187] got request after shutdown()
Sat Feb 5 07:35:22 [conn54755] got request after shutdown()
Sat Feb 5 07:35:22 [conn54786] got request after shutdown()
Sat Feb 5 07:35:22 [conn54246] got request after shutdown()
Sat Feb 5 07:35:22 [conn53657] got request after shutdown()
Sat Feb 5 07:35:22 [conn54272] got request after shutdown()
Sat Feb 5 07:35:22 [conn54517] got request after shutdown()
Sat Feb 5 07:35:22 [conn83] got request after shutdown()
Sat Feb 5 07:35:22 [conn54782] got request after shutdown()
Sat Feb 5 07:35:22 [conn53849] got request after shutdown()
Sat Feb 5 07:35:22 [conn53701] got request after shutdown()
Sat Feb 5 07:35:22 [conn54258] got request after shutdown()
Sat Feb 5 07:35:22 [conn54266] got request after shutdown()
Sat Feb 5 07:35:22 [conn54252] got request after shutdown()
Sat Feb 5 07:35:22 [conn54256] got request after shutdown()
Sat Feb 5 07:35:22 [conn5864] shutdown: waiting for fs preallocator...
Sat Feb 5 07:35:22 [conn5864] shutdown: closing all files...
Sat Feb 5 07:35:22 [conn54509] got request after shutdown()
Sat Feb 5 07:35:22 [conn53025] got request after shutdown()
Sat Feb 5 07:35:22 Got signal: 6 (Aborted).

Sat Feb 5 07:35:22 [conn54487] got request after shutdown()
Sat Feb 5 07:35:22 [conn54522] got request after shutdown()
Sat Feb 5 07:35:22 [conn54262] got request after shutdown()
Sat Feb 5 07:35:22 [conn54550] got request after shutdown()
Sat Feb 5 07:35:22 [conn53029] got request after shutdown()
Sat Feb 5 07:35:22 [conn38808] got request after shutdown()
Sat Feb 5 07:35:22 [conn54549] got request after shutdown()
Sat Feb 5 07:35:22 [conn27413] got request after shutdown()
Sat Feb 5 07:35:23 [conn54772] got request after shutdown()
Sat Feb 5 07:35:23 [conn54493] got request after shutdown()
Sat Feb 5 07:35:23 [conn50595] got request after shutdown()
Sat Feb 5 07:35:23 [conn54281] got request after shutdown()
Sat Feb 5 07:35:23 [conn54769] got request after shutdown()
Sat Feb 5 07:35:23 [conn423] got request after shutdown()
Sat Feb 5 07:35:23 [conn54744] got request after shutdown()
Sat Feb 5 07:35:23 [conn54485] got request after shutdown()
Sat Feb 5 07:35:23 [conn54551] got request after shutdown()
Sat Feb 5 07:35:23 [conn54343] got request after shutdown()
Sat Feb 5 07:35:23 [conn53721] got request after shutdown()
Sat Feb 5 07:35:23 [conn53703] got request after shutdown()
Sat Feb 5 07:35:23 [conn54756] got request after shutdown()
Sat Feb 5 07:35:23 [conn38854] got request after shutdown()
Sat Feb 5 07:35:23 [conn54544] got request after shutdown()
Sat Feb 5 07:35:23 [conn54750] got request after shutdown()
Sat Feb 5 07:35:23 [conn53823] got request after shutdown()
Sat Feb 5 07:35:23 [conn54767] got request after shutdown()
Sat Feb 5 07:35:23 [conn53623] got request after shutdown()
Sat Feb 5 07:35:23 [conn54558] got request after shutdown()
Sat Feb 5 07:35:23 [conn54751] got request after shutdown()
Sat Feb 5 07:35:23 [conn54761] got request after shutdown()
Sat Feb 5 07:35:23 [conn54300] got request after shutdown()
Sat Feb 5 07:35:23 [conn54259] got request after shutdown()
Sat Feb 5 07:35:23 [conn50608] got request after shutdown()
Sat Feb 5 07:35:23 [conn54339] got request after shutdown()
Sat Feb 5 07:35:23 [conn54335] got request after shutdown()
Sat Feb 5 07:35:23 [conn54231] got request after shutdown()
Sat Feb 5 07:35:23 [conn53098] got request after shutdown()
Sat Feb 5 07:35:23 [conn53920] got request after shutdown()
Sat Feb 5 07:35:23 [conn38768] got request after shutdown()
Sat Feb 5 07:35:23 Backtrace:
0x899789 0x3f7aa302d0 0x3f7aa30265 0x3f7aa31d10 0x3f7aa296e6 0x5491c4 0x5473c7 0x74fd22 0x7507a6 0x89ac76 0x8ad550 0x3f7b60673d 0x3f7aad3d1d
/home/david/mongodb/latest/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x899789]
/lib64/libc.so.6 [0x3f7aa302d0]
/lib64/libc.so.6(gsignal+0x35) [0x3f7aa30265]
/lib64/libc.so.6(abort+0x110) [0x3f7aa31d10]
/lib64/libc.so.6(__assert_fail+0xf6) [0x3f7aa296e6]
/home/david/mongodb/latest/bin/mongod(_ZN5mongo6rwlockC1ERKNS_6RWLockEbb+0x124) [0x5491c4]
/home/david/mongodb/latest/bin/mongod(_ZN5mongo9MongoFile13closeAllFilesERSt18basic_stringstreamIcSt11char_traitsIcESaIcEE+0x77) [0x5473c7]
/home/david/mongodb/latest/bin/mongod(_ZN5mongo14shutdownServerEv+0x592) [0x74fd22]
/home/david/mongodb/latest/bin/mongod(_ZN5mongo6dbexitENS_8ExitCodeEPKcb+0x436) [0x7507a6]
/home/david/mongodb/latest/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0xd36) [0x89ac76]
/home/david/mongodb/latest/bin/mongod(thread_proxy+0x80) [0x8ad550]
/lib64/libpthread.so.0 [0x3f7b60673d]
/lib64/libc.so.6(clone+0x6d) [0x3f7aad3d1d]

Sat Feb 5 07:35:23 [conn38693] got request after shutdown()
Sat Feb 5 07:35:24 [conn27509] got request after shutdown()



 Comments   
Comment by David Mytton [ 01/May/11 ]

See CS-501.

Comment by Bobby Joe [ 21/Apr/11 ]

Sorry for being unclear – pasted out of splunk, so read from the bottom up. Lemme see what else I can dig up, this happened a couple of times. Also, was running --quiet.

Comment by Eliot Horowitz (Inactive) [ 21/Apr/11 ]

@david, can you send more of the log? Looks like that happened after it was already shutdown.

Comment by Bobby Joe [ 21/Apr/11 ]

I am seeing this as well with 1.8.1. No stack backtrace in my logs, unfortunately.

mongod: Thu Apr 21 13:59:07 [conn66609] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn31472] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn48076] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn31976] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn67145] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn64154] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn31801] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn31804] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn31548] shutdown: closing all files...
mongod: Thu Apr 21 13:59:07 [conn31548] shutdown: waiting for fs preallocator...
mongod: Thu Apr 21 13:59:07 [conn31548] shutdown: going to close sockets...
mongod: Thu Apr 21 13:59:07 [conn32252] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn31548] shutdown: going to flush diaglog...
mongod: Thu Apr 21 13:59:07 [conn37871] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn67183] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn46655] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn65690] got request after shutdown()
mongod: Thu Apr 21 13:59:07 [conn31548] removing socket file: /tmp/mongodb-28027.sock
mongod: Thu Apr 21 13:59:07 [conn31548] removing socket file: /tmp/mongodb-27027.sock
mongod: Thu Apr 21 13:59:07 [conn31548] closing listening socket: 9
mongod: Thu Apr 21 13:59:07 [conn31548] closing listening socket: 8
mongod: Thu Apr 21 13:59:07 [conn31548] closing listening socket: 7
mongod: Thu Apr 21 13:59:07 [conn31548] closing listening socket: 5
mongod: Thu Apr 21 13:59:07 [conn31548] shutdown: going to close listening sockets...
mongod: Thu Apr 21 13:59:07 [conn31548] Uncaught std::exception: St9bad_alloc, terminating

Comment by David Mytton [ 03/Mar/11 ]

Experienced this again in 1.8.0-rc0:

/var/log/messages/

mongod:

Thu Mar 3 02:21:05 [conn60499] query sd.metrics ntoreturn:8 scanAndOrder reslen:36 nscanned:13412 { $query: { accId: 909, metric: "memSwap", sId: 933, tA:

{ $gte: new Date(1299116413000), $lte: new Date(1299118813000) }

, tAG:

{ $in: [ 5, 60 ] }

}, $orderby:

{ tA: 1 }

} nreturned:0 38157ms
Thu Mar 3 02:21:05 [conn37663] getmore local.oplog.rs cid:212890236044665853 getMore: { ts:

{ $gte: new Date(5578885522178179525) }

} bytes:12450 nreturned:1 3188ms
Thu Mar 3 02:21:09 [conn3610] query admin.$cmd ntoreturn:1 command:

{ writebacklisten: ObjectId('4d53762d6e1188e165e4ebfe') }

reslen:60 300033ms
Thu Mar 3 02:21:10 [conn36186] query admin.$cmd ntoreturn:1 command:

{ writebacklisten: ObjectId('4d6be173f9824b3eeb5a29b5') }

reslen:60 300034ms
Thu Mar 3 02:21:10 [conn37663] getmore local.oplog.rs cid:212890236044665853 getMore: { ts:

{ $gte: new Date(5578885522178179525) }

} bytes:20 nreturned:0 3768ms
Thu Mar 3 02:21:14 [conn3615] query admin.$cmd ntoreturn:1 command:

{ writebacklisten: ObjectId('4d62812512e6b9acb44504c6') }

reslen:60 300034ms
Thu Mar 3 02:21:28 [conn6304] Uncaught std::exception: St9bad_alloc, terminating
Thu Mar 3 02:21:28 dbexit:
Thu Mar 3 02:21:28 [conn6304] shutdown: going to close listening sockets...
Thu Mar 3 02:21:28 [conn6304] closing listening socket: 5
Thu Mar 3 02:21:28 [conn6304] closing listening socket: 6
Thu Mar 3 02:21:28 [conn6304] closing listening socket: 8
Thu Mar 3 02:21:28 [conn6304] closing listening socket: 9
Thu Mar 3 02:21:28 [conn6304] removing socket file: /tmp/mongodb-27018.sock
Thu Mar 3 02:21:28 [conn37663] Assertion: 13073:shutting down
Thu Mar 3 02:21:28 [initandlisten] now exiting
Thu Mar 3 02:21:28 dbexit: ; exiting immediately
0x561fd9 0x751e36 0x75ae07 0x8a65ee 0x8b6a30 0x3db720673d 0x3db66d3f6d
Thu Mar 3 02:21:28 [conn6304] removing socket file: /tmp/mongodb-28018.sock
Thu Mar 3 02:21:28 [conn6304] shutdown: going to flush diaglog...
Thu Mar 3 02:21:28 [conn6304] shutdown: going to close sockets...
Thu Mar 3 02:21:28 [conn6304] shutdown: waiting for fs preallocator...
Thu Mar 3 02:21:28 [conn6304] shutdown: closing all files...
/home/david/mongodb/latest/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x129) [0x561fd9]
/home/david/mongodb/latest/bin/mongod(_ZN5mongo15receivedGetMoreERNS_10DbResponseERNS_7MessageERNS_5CurOpE+0x7f6) [0x751e36]
/home/david/mongodb/latest/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x14f7) [0x75ae07]
/home/david/mongodb/latest/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a65ee]
/home/david/mongodb/latest/bin/mongod(thread_proxy+0x80) [0x8b6a30]
/lib64/libpthread.so.0 [0x3db720673d]
/lib64/libc.so.6(clone+0x6d) [0x3db66d3f6d]
Thu Mar 3 02:21:28 [conn37663] AssertionException in connThread, closing client connection
Thu Mar 3 02:21:28 [conn37663] shutting down
Thu Mar 3 02:21:28 Got signal: 11 (Segmentation fault).

Thu Mar 3 02:21:28 Backtrace:
0x8a3189 0x3db66302d0 0x140f0e00
/home/david/mongodb/latest/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a3189]
/lib64/libc.so.6 [0x3db66302d0]
[0x140f0e00]

Comment by David Mytton [ 16/Feb/11 ]

We don't do $inc or $rename but SERVER-2558 could well be related since this crash happened when a new slave was syncing. Same as in SERVER-2566.

Comment by Aaron Staple [ 16/Feb/11 ]

Another possibility is SERVER-2558 which leaks some when a new slave starts replicating. Is that something that happened often in your setup when you saw the memory issue described here?

Comment by Aaron Staple [ 14/Feb/11 ]

Sorry if the above is somewhat repetitive with earlier comments here - just want to be thorough in ruling out known causes.

Comment by Aaron Staple [ 14/Feb/11 ]

A couple of other modifiers that could generate the small memory leak in certain error cases are $inc and $rename. Any chance you are using one of those? Also, when you do writes are you checking for errors? If so, and you are running into these problem cases, you would see user assertions coming back with certain operations.

Comment by David Mytton [ 14/Feb/11 ]

david@rs1a ~: ./console
MongoDB shell version: 1.7.5
connecting to: localhost:27018/test
set1:PRIMARY> use sd
switched to db sd
set1:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 6, "clientCursors_size" : 6, "timedOut" : 56, "ok" : 1 }

david@rs2a ~: ./console
MongoDB shell version: 1.7.5
connecting to: localhost:27018/test
set2:PRIMARY> use sd
switched to db sd
set2:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 6, "clientCursors_size" : 6, "timedOut" : 176, "ok" : 1 }

set2:PRIMARY>

david@rs3a ~: ./console
MongoDB shell version: 1.7.5
connecting to: localhost:27018/test
set3:PRIMARY> use sd
switched to db sd
set3:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 28, "clientCursors_size" : 28, "timedOut" : 9, "ok" : 1 }
Comment by Aaron Staple [ 14/Feb/11 ]

Hi David,

Just as another data point, could you please run cursorInfo on those machines again? (So we can see what the change has been since last week.) Thanks.

Comment by David Mytton [ 10/Feb/11 ]

We don't do any map reduce. It's possible a modifier could be the cause but we don't really use any except for $set and $unset.

Across our 3 replica sets:

david@rs1a ~: ./console
MongoDB shell version: 1.7.5
connecting to: localhost:27018/test
set1:PRIMARY> use sd
switched to db sd
set1:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 9, "clientCursors_size" : 9, "timedOut" : 13, "ok" : 1 }

set1:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 9, "clientCursors_size" : 9, "timedOut" : 13, "ok" : 1 }

set1:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 9, "clientCursors_size" : 9, "timedOut" : 13, "ok" : 1 }

set1:PRIMARY>

david@rs2a ~: ./console
MongoDB shell version: 1.7.5
connecting to: localhost:27018/test
set2:PRIMARY> use sd
switched to db sd
set2:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 3, "clientCursors_size" : 3, "timedOut" : 87, "ok" : 1 }

set2:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 4, "clientCursors_size" : 4, "timedOut" : 88, "ok" : 1 }

set2:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 3, "clientCursors_size" : 3, "timedOut" : 88, "ok" : 1 }

david@rs3a ~: ./console
MongoDB shell version: 1.7.5
connecting to: localhost:27018/test
set3:PRIMARY> use sd
switched to db sd
set3:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 31, "clientCursors_size" : 31, "timedOut" : 8, "ok" : 1 }

set3:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 31, "clientCursors_size" : 31, "timedOut" : 8, "ok" : 1 }

set3:PRIMARY> db.runCommand(

{cursorInfo:1}

)

{ "totalOpen" : 30, "clientCursors_size" : 30, "timedOut" : 8, "ok" : 1 }
Comment by Aaron Staple [ 10/Feb/11 ]

Hi David,

Could you please send the output of db.runCommand(

{cursorInfo:1}

) at a few time intervals so we can check cursor usage?

Also, we have found a couple small memory leaks recently in 1.7.5 which could contribute to something like this depending on what operations you are running:

SERVER-2497 can occur if you attempt to apply a modifier to an object that is not valid for that modifier - for example, $push to a non array or $rename on a nonexistent field

SERVER-2522 if you have a map reduce job where the reduce function throws an exception

Both of these leaks are pretty small so you would need to be running a large number of these problem operations to see serious issues.

Comment by David Mytton [ 06/Feb/11 ]

This was 1.7.5 on the master of replica set 2 (rs2a).

Physical memory usage remained almost the same although swap usage was steadily increasing to the point where it crashed.

Comment by Eliot Horowitz (Inactive) [ 06/Feb/11 ]

Was this on 1.7.5?
What kind of monitoring can we look at?
That error usually means the system was out of ram.

Comment by David Mytton [ 05/Feb/11 ]

Uploaded mongod log privately to CS-302

Generated at Thu Feb 08 03:00:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.