[SERVER-2485] "Uncaught std::exception: St9bad_alloc, terminating" caused clean crash Created: 05/Feb/11 Updated: 12/Jul/16 Resolved: 02/Sep/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Stability |
| Affects Version/s: | 1.7.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Mytton | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Operating System: | Linux | ||||
| Participants: | |||||
| Description |
|
mongod (primary) crashed with "Uncaught std::exception: St9bad_alloc, terminating". Looked like a clean shutdown although the lock file was not removed. Upon restarting the mongod it complained of unclean shutdown. Sat Feb 5 07:34:58 [conn54697] remove sd.metrics query: { _id: ObjectId('4d4455b4150ba0457700056b'), accId: 2367, sId: 1 } 281ms 228ms 119ms 137ms }, { accId: 2805, tA: { $gt: new Date(1296286500226) }}, { tA: { $gt: new Date(1296632100228) }} ] }, $orderby: { _id: 1 } } nreturned:1000 6786ms 150ms 155ms , tAG: { $in: [ 5, 60 ] }}, $orderby: { tA: 1 } } nreturned:281 2009ms }, { accId: 2805, tA: { $gt: new Date(1296286510929) }}, { tA: { $gt: new Date(1296632110930) }} ] }, $orderby: { _id: 1 } } nreturned:1000 9124ms 353ms Sat Feb 5 07:35:22 [conn54487] got request after shutdown() Sat Feb 5 07:35:23 [conn38693] got request after shutdown() |
| Comments |
| Comment by David Mytton [ 01/May/11 ] |
|
See CS-501. |
| Comment by Bobby Joe [ 21/Apr/11 ] |
|
Sorry for being unclear – pasted out of splunk, so read from the bottom up. Lemme see what else I can dig up, this happened a couple of times. Also, was running --quiet. |
| Comment by Eliot Horowitz (Inactive) [ 21/Apr/11 ] |
|
@david, can you send more of the log? Looks like that happened after it was already shutdown. |
| Comment by Bobby Joe [ 21/Apr/11 ] |
|
I am seeing this as well with 1.8.1. No stack backtrace in my logs, unfortunately. mongod: Thu Apr 21 13:59:07 [conn66609] got request after shutdown() |
| Comment by David Mytton [ 03/Mar/11 ] |
|
Experienced this again in 1.8.0-rc0: /var/log/messages/ mongod: Thu Mar 3 02:21:05 [conn60499] query sd.metrics ntoreturn:8 scanAndOrder reslen:36 nscanned:13412 { $query: { accId: 909, metric: "memSwap", sId: 933, tA: { $gte: new Date(1299116413000), $lte: new Date(1299118813000) }, tAG: { $in: [ 5, 60 ] }}, $orderby: { tA: 1 } } nreturned:0 38157ms } bytes:12450 nreturned:1 3188ms reslen:60 300033ms reslen:60 300034ms } bytes:20 nreturned:0 3768ms reslen:60 300034ms Thu Mar 3 02:21:28 Backtrace: |
| Comment by David Mytton [ 16/Feb/11 ] |
|
We don't do $inc or $rename but |
| Comment by Aaron Staple [ 16/Feb/11 ] |
|
Another possibility is |
| Comment by Aaron Staple [ 14/Feb/11 ] |
|
Sorry if the above is somewhat repetitive with earlier comments here - just want to be thorough in ruling out known causes. |
| Comment by Aaron Staple [ 14/Feb/11 ] |
|
A couple of other modifiers that could generate the small memory leak in certain error cases are $inc and $rename. Any chance you are using one of those? Also, when you do writes are you checking for errors? If so, and you are running into these problem cases, you would see user assertions coming back with certain operations. |
| Comment by David Mytton [ 14/Feb/11 ] |
|
david@rs1a ~: ./console ) { "totalOpen" : 6, "clientCursors_size" : 6, "timedOut" : 56, "ok" : 1 }david@rs2a ~: ./console ) { "totalOpen" : 6, "clientCursors_size" : 6, "timedOut" : 176, "ok" : 1 }set2:PRIMARY> david@rs3a ~: ./console ) { "totalOpen" : 28, "clientCursors_size" : 28, "timedOut" : 9, "ok" : 1 } |
| Comment by Aaron Staple [ 14/Feb/11 ] |
|
Hi David, Just as another data point, could you please run cursorInfo on those machines again? (So we can see what the change has been since last week.) Thanks. |
| Comment by David Mytton [ 10/Feb/11 ] |
|
We don't do any map reduce. It's possible a modifier could be the cause but we don't really use any except for $set and $unset. Across our 3 replica sets: david@rs1a ~: ./console ) { "totalOpen" : 9, "clientCursors_size" : 9, "timedOut" : 13, "ok" : 1 }set1:PRIMARY> db.runCommand( {cursorInfo:1}) { "totalOpen" : 9, "clientCursors_size" : 9, "timedOut" : 13, "ok" : 1 }set1:PRIMARY> db.runCommand( {cursorInfo:1}) { "totalOpen" : 9, "clientCursors_size" : 9, "timedOut" : 13, "ok" : 1 }set1:PRIMARY> david@rs2a ~: ./console ) { "totalOpen" : 3, "clientCursors_size" : 3, "timedOut" : 87, "ok" : 1 }set2:PRIMARY> db.runCommand( {cursorInfo:1}) { "totalOpen" : 4, "clientCursors_size" : 4, "timedOut" : 88, "ok" : 1 }set2:PRIMARY> db.runCommand( {cursorInfo:1}) { "totalOpen" : 3, "clientCursors_size" : 3, "timedOut" : 88, "ok" : 1 }david@rs3a ~: ./console ) { "totalOpen" : 31, "clientCursors_size" : 31, "timedOut" : 8, "ok" : 1 }set3:PRIMARY> db.runCommand( {cursorInfo:1}) { "totalOpen" : 31, "clientCursors_size" : 31, "timedOut" : 8, "ok" : 1 }set3:PRIMARY> db.runCommand( {cursorInfo:1}) { "totalOpen" : 30, "clientCursors_size" : 30, "timedOut" : 8, "ok" : 1 } |
| Comment by Aaron Staple [ 10/Feb/11 ] |
|
Hi David, Could you please send the output of db.runCommand( {cursorInfo:1}) at a few time intervals so we can check cursor usage? Also, we have found a couple small memory leaks recently in 1.7.5 which could contribute to something like this depending on what operations you are running:
Both of these leaks are pretty small so you would need to be running a large number of these problem operations to see serious issues. |
| Comment by David Mytton [ 06/Feb/11 ] |
|
This was 1.7.5 on the master of replica set 2 (rs2a). Physical memory usage remained almost the same although swap usage was steadily increasing to the point where it crashed. |
| Comment by Eliot Horowitz (Inactive) [ 06/Feb/11 ] |
|
Was this on 1.7.5? |
| Comment by David Mytton [ 05/Feb/11 ] |
|
Uploaded mongod log privately to CS-302 |