[SERVER-3796] mongodb fails db.currentOp(); how to debug? Created: 09/Sep/11  Updated: 11/Jul/16  Resolved: 10/Sep/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: ttt Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centos 5.6 x64 xen
mongo-10gen-server-1.8.3-mongodb_1.rpm


Operating System: Linux
Participants:

 Description   

{{
> db.currentOp();
Fri Sep 9 18:21:15 Assertion failure r scripting/engine_spidermonkey.cpp 634
0x4d539e 0x4d810d 0x530e45 0x523827 0x5ed545 0x5ed122 0x5dfafc 0x5dfe73 0x5c2dae 0x5bead6 0x57bccf 0x57bc18 0x57bb01 0x52ed8d 0x47be59 0x47d036 0x3406e1d994 0x4760e9
mongo(_ZN5mongo12sayDbContextEPKc+0xae) [0x4d539e]
mongo(_ZN5mongo8assertedEPKcS1_j+0xfd) [0x4d810d]
mongo(_ZN5mongo9Convertor5tovalERKNS_11BSONElementE+0x1445) [0x530e45]
mongo(ZN5mongo16resolveBSONFieldEP9JSContextP8JSObjectljPS3+0x1f7) [0x523827]
mongo(js_LookupPropertyWithFlags+0x421) [0x5ed545]
mongo(js_LookupProperty+0x49) [0x5ed122]
mongo [0x5dfafc]
mongo(js_CallIteratorNext+0xd8) [0x5dfe73]
mongo(js_Interpret+0x32c8) [0x5c2dae]
mongo(js_Execute+0x418) [0x5bead6]
mongo(JS_EvaluateUCScriptForPrincipals+0xb5) [0x57bccf]
mongo(JS_EvaluateUCScript+0x5a) [0x57bc18]
mongo(JS_EvaluateScript+0x80) [0x57bb01]
mongo(_ZN5mongo7SMScope4execERKNS_10StringDataERKSsbbbi+0x10d) [0x52ed8d]
mongo(_Z5_mainiPPc+0x21b9) [0x47be59]
mongo(main+0x26) [0x47d036]
/lib64/libc.so.6(__libc_start_main+0xf4) [0x3406e1d994]
mongo(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKcl+0x49) [0x4760e9]
error:assertion scripting/engine_spidermonkey.cpp:634
> db.currentOp();
SyntaxError: invalid quantifier +21tahun\.xxxxx\.com
error2:(shellhelp1) exec failed: SyntaxError: invalid quantifier +21tahun\.xxxxx\.com
> db.stats();
SyntaxError: invalid quantifier +21tahun\.xxxxx\.com
error2:(shellhelp1) exec failed: SyntaxError: invalid quantifier +21tahun\.xxxxx\.com
>
}}



 Comments   
Comment by Eliot Horowitz (Inactive) [ 10/Sep/11 ]

I think the shell from 1.8.4 will work better as the heap size is larger.
I thikn you had a lot of ops queues, and so it blew the heap.

Comment by ttt [ 10/Sep/11 ]

now - running fine, not a peak traffic:
insert query update delete getmore command flushes mapped vsize res
faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
21 4027 23 108 0 80 0 34g 51.8g 5.58g
55 4.4 0 0|0 1|0 659k 8m 4149 13:46:38
17 3150 15 99 0 84 0 34g 51.8g 5.58g
23 2.1 0 0|0 0|0 463k 6m 4144 13:46:39
4 3471 7 85 2 47 0 34g 51.8g 5.58g
26 1.6 0 0|0 1|0 436k 6m 4144 13:46:40
6 2960 8 115 0 3 0 34g 51.8g 5.58g
18 2.1 0 1|0 1|0 361k 5m 4144 13:46:41
19 3397 12 127 0 9 0 34g 51.8g 5.58g
27 2.6 0 0|0 0|0 476k 6m 4139 13:46:42
8 3206 10 85 0 219 0 34g 51.8g 5.58g
37 2 0 0|0 2|0 428k 7m 4139 13:46:43
14 3285 15 90 0 17 0 34g 51.7g 5.58g
45 7.8 0 0|0 1|0 451k 6m 4134 13:46:44
0 3233 4 104 2 29 0 34g 51.7g 5.58g
36 1.9 0 0|0 1|0 407k 6m 4134 13:46:45
0 2972 1 124 0 18 0 34g 51.7g 5.58g
26 2 0 0|0 1|0 364k 5m 4130 13:46:46
8 2874 9 78 0 9 0 34g 51.7g 5.58g
50 6.5 100 0|0 2|0 368k 6m 4130 13:46:47
insert query update delete getmore command flushes mapped vsize res
faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
4 3282 9 113 0 17 0 34g 50.1g 5.75g
28 2.1 0 0|0 1|0 447k 5m 3706 13:58:25
7 2801 10 105 0 11 0 34g 50g 5.75g
21 3 0 0|0 0|0 448k 5m 3696 13:58:26
9 3078 4 104 0 22 0 34g 50g 5.75g
36 1.7 0 10|2 11|2 382k 5m 3691 13:58:27
15 2782 18 57 2 6 0 34g 50g 5.75g
35 1.5 0 0|0 1|1 417k 5m 3686 13:58:28
14 2972 14 149 0 9 0 34g 50g 5.75g
16 3 0 35|4 38|4 433k 5m 3686 13:58:29
23 3177 18 88 0 17 0 34g 50g 5.75g
25 4.8 0 0|0 0|0 492k 6m 3686 13:58:30
13 2654 11 76 2 14 0 34g 50g 5.75g
26 1.8 0 0|0 1|0 447k 5m 3687 13:58:31
14 3142 18 81 0 221 0 34g 50g 5.75g
40 23.3 0 27|10 34|10 422k 8m 3682 13:58:32
16 2995 13 63 2 12 0 34g 50g 5.75g
34 4.7 0 0|0 0|0 445k 6m 3677 13:58:33
8 3271 8 96 0 18 0 34g 49.9g 5.75g
34 1.9 0 48|2 50|3 545k 6m 3672 13:58:34

  1. free
    total used free shared buffers cached
    Mem: 7077888 7050196 27692 0 12876 6098780
    -/+ buffers/cache: 938540 6139348
    Swap: 2097144 72492 2024652

{
"db" : "xxxxx",
"collections" : 38,
"objects" : 15235640,
"avgObjSize" : 1659.7376002583417,
"dataSize" : 25287164572,
"storageSize" : 27740390048,
"numExtents" : 228,
"indexes" : 118,
"indexSize" : 7194130048,
"fileSize" : 38569771008,
"ok" : 1
}

problems last day:
insert query update delete getmore command flushes mapped vsize res
faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
0 2624 1 60 0 7 0 34g 64.3g 5.05g
83 0.4 0 54|1010 181|1010 296k 4m 6950 16:18:33
1 2985 0 78 0 9 0 34g 64.4g 5.05g
109 4.8 0 33|1048 100|1048 376k 6m 6964 16:18:34
6 3034 3 85 0 5 0 34g 64.4g 5.05g
70 6.3 0 137|1060 190|1060 365k 5m 6971 16:18:35
3 3197 0 101 0 18 0 34g 64.5g 5.05g
88 1.7 0 47|1109 98|1109 385k 6m 6982 16:18:36
2 2780 1 83 0 7 0 34g 64.5g 5.05g
91 9.6 0 32|1120 84|1123 325k 5m 6991 16:18:37
2 2556 0 86 0 10 0 34g 64.5g 5.05g
74 1.4 0 55|1127 109|1128 307k 5m 6977 16:18:38
1 2511 3 75 0 20 0 34g 64.6g 5.06g
93 11.1 0 61|1141 111|1141 304k 5m 7008 16:18:39
9 2981 4 106 0 8 0 34g 64.6g 5.06g
92 4 0 32|1107 93|1107 361k 7m 7026 16:18:40
2 2521 5 74 0 6 1 34g 64.7g 5.06g
77 1.6 0 99|1100 157|1100 295k 6m 7033 16:18:41
8 3536 3 131 0 4 0 34g 64.6g 5.06g
94 3.6 0 15|1079 78|1080 432k 6m 7017 16:18:42

From my position it looks like there was a bit of starving of mongod from
disk io by other process, and then it hit 100% cpu wall (normally running
<60%) and couldn't recover as it sometimes does (when hitting cpu wall
mongod actually starts using less disk io). So it had long running queries
lagging in queue and then it couldn't show currentOp() of them. I did
restart mongod at some point and then it got into same trouble again pretty
soon.
After the traffic slowed down and the queue cleared, mongod started to run
currentOp() correctly without restart...

On Sat, Sep 10, 2011 at 7:38 AM, Eliot Horowitz (JIRA) <jira@mongodb.org>wrote:

Comment by Eliot Horowitz (Inactive) [ 10/Sep/11 ]

That probably means the vm ran out of memory.
Can you run mongostat?

Comment by Eliot Horowitz (Inactive) [ 10/Sep/11 ]

That probably means the vm ran out of memory.
Can you run mongostat?

Generated at Thu Feb 08 03:04:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.