[SERVER-9049] Mongod 2.4.0 takes upto 99 % cpu and query result is undef Created: 21/Mar/13  Updated: 10/Dec/14  Resolved: 01/Apr/13

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 2.4.0
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Arati Sethy Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux


Issue Links:
Related
Operating System: Linux
Steps To Reproduce:

1. Run stand alone mongod
2. Create 2 collections with cap of 1GB each
3. have 1.3 and 900k docs in those collections
4. run a query with timestamp range for list of documents with count function

Participants:

 Description   

Hi All,
I upgraded mongodb to 240. I am running standalone mongo server with 2 collections. one having 1.3 million docs other with 900k docs. Running query with a filter to have count of records makes CPU usage to go up till 99%. And query returns no data. I am using perl driver to query. Even after query execution is over, mongod CPU usages remains at 80-90%.
Strace on mongod gave me below out put

Process 3284 attached - interrupt to quit
setuid(4294967295) = 0
setuid(4294967295) = 0
setuid(4294967295) = 0
setuid(4294967295) = 0
setuid(4294967295) = 0
setuid(4294967295) = 0
setuid(4294967295) = 0

My collections are indexed on _id.

Can any one help if i am missing something.
I am using capped collection of 1GB.

Thanks



 Comments   
Comment by Eliot Horowitz (Inactive) [ 01/Apr/13 ]

Using javascript to do a count is going to be slower than the built in count.
Nothing is wrong per se, just counting everything with javascript takes time.
The aggregation framework might be faster as well.

If you have another operation that is slow, please open a ticket for that specific one.

Comment by Arati Sethy [ 01/Apr/13 ]

I have tried this . It gives me back result in 3-5 secs. This is like a work around just for count. I have more requirements to aggregate values of a particular filed.(Not part of this collection/query). Question is to reduce CPU cycles and memory usage.

Thanks for your immediate response.

Comment by Eliot Horowitz (Inactive) [ 01/Apr/13 ]

Can you try something like this instead

db.apslog.find( { action_taken : { $in : [0,1,2,4] }, timestamp : { $gte : $frm_month, $lte : $totime } } ).count()

Comment by Arati Sethy [ 01/Apr/13 ]

I am executing below group by command using perl driver run_command . It takes 25-30 secs for each query. I have a requirement of getting whole days aggregated data per hour. I am unable to achieve that functionality with below data size , as the application does 24 queries and each query takes 30 secs to complete.
-----------------------------

my $cmd_str_month = '{ \'group\' => { \'ns\' => \'apslog\', \'$reduce\' => \'function(doc,out){if (( doc.action_taken == 0 || doc.action_taken == 1 || doc.action_taken == 2 || doc.action_taken == 4 ) &&(doc.timestamp >= $frm_month && doc.timestamp <= $totime) ) { out.count++ ; }}\', \'initial\' =>

{ \'count\' => 0 }

, \'key\' =>

{ \'attack_category\' => 1 }

} };';

-------------------

> db.apslog.count()
1737233
> db.apslog.find().limit(1).pretty()
{
"_id" : ObjectId("5146b8d3e395e59fd9ea5820"),
"timestamp" : NumberLong("1363589330910"),
"sev_level" : 1,
"action_taken" : 1,
"followup_action" : 0,
"attack_description" : 14,
"vip_ip" : "10.11.30.177",
"client_ip" : "10.11.18.153",
"vip_port" : 80,
"client_port" : 56022,
"rule_type" : 0,
"wf_log_protocol" : 0,
"wf_proxyip" : "10.11.18.153",
"wf_proxyport" : 56022,
"method" : "GET",
"rule_id" : "security-policy",
"attack_detail" : "",
"session_id" : "",
"wf_useragent" : "Wget/1.11.4",
"wf_authenticateduser" : "",
"referer" : "",
"url" : "10.11.30.177/.index.html"
}
>
> db.apslog.count()
1737233

> db.apslog.getIndexKeys()
[

{ "_id" : 1 }

,

{ "timestamp" : 1 }

,

{ "timestamp" : 1, "_id" : 0 }

]
> > db.apslog.validate()
{
"ns" : "barracudalog.apslog",
"firstExtent" : "0:1081000 ns:barracudalog.apslog",
"lastExtent" : "4:2000 ns:barracudalog.apslog",
"extentCount" : 17,
"datasize" : 922085004,
"nrecords" : 1737233,
"lastExtentSize" : 307515392,
"padding" : 1,
"firstExtentDetails" :

{ "loc" : "0:1081000", "xnext" : "2:60ae4000", "xprev" : "null", "nsdiag" : "barracudalog.apslog", "size" : 32768, "firstRecord" : "0:10810b0", "lastRecord" : "0:1088d10" }

,
"lastExtentDetails" :

{ "loc" : "4:2000", "xnext" : "null", "xprev" : "3:60e3e000", "nsdiag" : "barracudalog.apslog", "size" : 307515392, "firstRecord" : "4:20b0", "lastRecord" : "4:5838054" }

,
"deletedCount" : 15,
"deletedSize" : 215022772,
"nIndexes" : 3,
"keysPerIndex" :

{ "barracudalog.apslog.$_id_" : 1737233, "barracudalog.apslog.$timestamp_1" : 1737233, "barracudalog.apslog.$timestamp_1__id_0" : 1737233 }

,
"valid" : true,
"errors" : [ ],
"warning" : "Some checks omitted for speed. use

{full:true}

option to do more thorough scan.",
"ok" : 1
}

Comment by Eliot Horowitz (Inactive) [ 01/Apr/13 ]

So the result is ok/correct?
Can you send the exact query you are running?
How long does it take?

Comment by Arati Sethy [ 01/Apr/13 ]

Hi Eliot,
I have added index for timestamp. I am running aggregation command to get count of rows in timestamp range. My problem is not query result. Problem is mongod server uses 117% CPU if i run on 4 core machine. And it us using virtual memory of 20G and actual of 13.7 % memory.

Comment by Eliot Horowitz (Inactive) [ 01/Apr/13 ]

What do you mean query result is undefined?
Have you tried adding an index on that field?

Comment by Arati Sethy [ 28/Mar/13 ]

Running this query every 10 mins for 1 day resulted in 20G virtual memory usage.
Top output
----------------

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4166 root 20 0 20.2g 1.1g 1.0g S 43 13.7 117:31.92 mongod

Comment by Arati Sethy [ 26/Mar/13 ]

Hi Thomas,
I got this prob even with mongod 2.2.
I have a collection indexed with timestamp with below details
> db.weblog.validate()
{
"ns" : "barracudalog.weblog",
"capped" : true,
"max" : 2147483647,
"firstExtent" : "2:2000 ns:barracudalog.weblog",
"lastExtent" : "2:2000 ns:barracudalog.weblog",
"extentCount" : 1,
"datasize" : 1052793780,
"nrecords" : 1309445,
"lastExtentSize" : 1073745920,
"padding" : 1,
"firstExtentDetails" :

{ "loc" : "2:2000", "xnext" : "null", "xprev" : "null", "nsdiag" : "barracudalog.weblog", "size" : 1073745920, "firstRecord" : "2:b80b01c", "lastRecord" : "2:b80a9b4" }

,
"deletedCount" : 2,
"deletedSize" : 844,
"nIndexes" : 2,
"keysPerIndex" :

{ "barracudalog.weblog.$_id_" : 1309445, "barracudalog.weblog.$_timestampclient_ipclient_portapplication_ipapplication_port" : 1309445 }

,
"valid" : true,
"errors" : [ ],
"warning" : "Some checks omitted for speed. use

{full:true}

option to do more thorough scan.",
"ok" : 1
}
>
> db.weblog.getIndexKeys()
[

{ "_id" : 1 }

,

{ "timestamp" : 1, "client_ip" : 1, "client_port" : 1, "application_ip" : 1, "application_port" : 1 }

]
>

when i do a query to get count of logs in last week/month CPU spikes to 99.3% and mem usage is 25.9% from top command

Below is currentop data and mongostats data
> db.currentOp()
{
"inprog" : [
{
"opid" : 284,
"active" : true,
"secs_running" : 0,
"op" : "query",
"ns" : "barracudalog.weblog",
"query" : {
"count" : "weblog",
"query" : {
"timestamp" :

{ "$lte" : 1364296175000, "$gte" : 1364209775000 }

}
},
"client" : "127.0.0.1:54321",
"desc" : "conn114",
"threadId" : "0x41808950",
"connectionId" : 114,
"locks" :

{ "^" : "r", "^barracudalog" : "R" }

,
"waitingForLock" : false,
"numYields" : 1,
"lockStats" : {
"timeLockedMicros" :

{ "r" : NumberLong(536127), "w" : NumberLong(0) }

,
"timeAcquiringMicros" :

{ "r" : NumberLong(268098), "w" : NumberLong(0) }

}
}
]
}
>

-------------------------------
./mongostat
connected to: 127.0.0.1
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 1|0 62b 2k 3 04:10:46
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 1|0 62b 2k 3 04:10:47
0 0 0 0 0 3 0 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 1|0 256b 2k 3 04:10:48
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 1|0 62b 2k 3 04:10:49
0 0 0 0 0 5 0 6.01g 12.2g 260m 1 barracudalog:0.0% 0 0|0 1|0 450b 2k 3 04:10:50
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 1|0 62b 2k 3 04:10:51
0 0 0 0 0 1 1 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 1|0 62b 2k 3 04:10:52
0 2 0 0 0 3 0 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 1|0 256b 2k 3 04:10:53
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 1|0 62b 2k 3 04:10:54
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 1|0 62b 2k 3 04:10:55
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
0 0 0 0 0 17 0 6.01g 12.2g 260m 0 barracudalog:0.0% 0 0|0 0|0 1k 5k 2 04:10:56
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:10:57
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:10:58
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:10:59
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:00
0 0 0 0 0 3 0 6.01g 12.2g 260m 0 .:0.0% 0 0|0 0|0 300b 2k 2 04:11:01
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:02
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:03
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:04
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:05
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:06
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:07
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:08
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:09
0 0 0 0 0 1 0 6.01g 12.2g 260m 0 local:0.0% 0 0|0 0|0 62b 2k 2 04:11:10

-------------------------------------

Please help to resolve this problem. I use query every 5 mins to get count of logs for 1 hr, daily/weekly and monthly basis

Thanks,
Arati

Comment by Thomas Rueckstiess [ 25/Mar/13 ]

Hi Arati,

Do you still know the exact syntax of the query you used? We don't have enough information to diagnose the problem at this point. Are you saying the same queries are now running without the increase in CPU%?

Is your mongod instance in MMS?

Regards,
Thomas

Comment by Arati Sethy [ 22/Mar/13 ]

I saw this problem only once. I have mongotop output when it was taking 97% cpu even if i was running no query.

connected to: 127.0.0.1
 
                            ns       total        read       write              2013-03-22T02:48:35
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
        barracudalog.system.js         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:36
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
        barracudalog.system.js         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:37
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
        barracudalog.system.js         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:38
              barracudalog.^Bns    161043ms    161043ms         0ms
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:39
              barracudalog.^Bns    224493ms    224493ms         0ms
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:40
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
        barracudalog.system.js         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:41
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
        barracudalog.system.js         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:42
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
        barracudalog.system.js         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:43
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
        barracudalog.system.js         0ms         0ms         0ms
 
                            ns       total        read       write              2013-03-22T02:48:44
            local.system.users         0ms         0ms         0ms
          local.system.replset         0ms         0ms         0ms
          local.system.indexes         0ms         0ms         0ms
             local.startup_log         0ms         0ms         0ms
           barracudalog.weblog         0ms         0ms         0ms
        barracudalog.systemlog         0ms         0ms         0ms
     barracudalog.system.users         0ms         0ms         0ms
        barracudalog.system.js         0ms         0ms         0ms

Comment by Tad Marshall [ 21/Mar/13 ]

Can you post the output of running mongotop and a minute or so of mongostat, along with db.currentOp()?

Please post the query you are using and the explain output as well: http://docs.mongodb.org/manual/reference/method/cursor.explain/#cursor.explain – this may be easier if done in the javascript mongo shell.

Generated at Thu Feb 08 03:19:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.