[SERVER-19080] High CPU (user usage in MMS) after bulk inserts Created: 23/Jun/15  Updated: 14/Apr/16  Resolved: 31/Aug/15

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 3.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Maziyar Panahi Assignee: Sam Kleinman (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2015-06-30 at 13.44.16.png     PNG File Screen Shot 2015-07-01 at 10.33.46.png     PNG File Screen Shot 2015-08-31 at 22.28.04.png     PNG File screenshot-1.png     PNG File screenshot-2.png     PNG File screenshot-3.png     PNG File screenshot-4.png     PNG File screenshot-5.png     PNG File screenshot-6.png     PNG File screenshot-7.png    
Operating System: ALL
Steps To Reproduce:

Start a fresh replica set,
Start few bulk inserts,
The CPU usage goes over 100% and remains there until next restart.

Participants:

 Description   

Hello,

I have a replica set MongoDB 3.0.4 with stable load (60-100 inserts, 20-30 updates, 10 queries). Sometimes I do bulk inserts like 20K 30K and after this the CPU goes high even though I am done with the operation and there is nothing else.

In MMS when I look at the CPU of a fresh MongoDB (just started) everything is normal. But when I use bulk write to insert 20k or 40k or any job that uses cache the CPU usage (specially user usage) goes over 100%. The servers are isolated from any other applications and they only use MongoDB and I check with other monitoring tools it's all MongoDB. When I restart the MongoDB it all goes away.

How I start my replica:

mongod --storageEngine wiredTiger --wiredTigerCacheSizeGB=2 --dbpath /data/ --replSet rs0 --fork --logpath /data/fork.log

I also tried different cache size like 5 or 10 and it was the same. The only thing that looks different before bulk inserts and after bulk inserts is the cache usage in MMS.

Stats:

rs0:PRIMARY> db.stats()
{
        "db" : "test",
        "collections" : 66,
        "objects" : 1943772773,
        "avgObjSize" : 741.1523292167237,
        "dataSize" : 1440631718177,
        "storageSize" : 736019369984,
        "numExtents" : 0,
        "indexes" : 193,
        "indexSize" : 27145035776,
        "ok" : 1
}

                       insert query update delete getmore command % dirty % used flushes vsize  res qr|qw ar|aw netIn netOut conn set repl     time
       localhost:27017    *33     3    *10     *0       0     9|0     6.1   33.8       0  5.8G 5.2G   0|3   1|0    1k   407k  112 rs0  SEC 10:38:14
mongodb-replica1:27017     31    *0     18     *0      34     8|0    22.0   50.5       0  2.2G 1.5G   0|1   1|0   50k    62k  107 rs0  PRI 10:38:14
mongodb-replica2:27017    *33     3    *10     *0       0    10|0     6.1   33.8       0  5.8G 5.2G   0|3   1|0    1k   407k  112 rs0  SEC 10:38:14
 
       localhost:27017    *41    10    *14     *0       0    10|0     6.1   33.8       0  5.8G 5.2G   0|3   1|0    3k   386k  112 rs0  SEC 10:38:15
mongodb-replica1:27017     39    *0     14     *0      42    11|0    22.0   50.5       0  2.2G 1.5G   0|1   1|0   54k    68k  107 rs0  PRI 10:38:15
mongodb-replica2:27017    *41    10    *14     *0       0     9|0     6.1   33.8       0  5.8G 5.2G   0|3   2|0    3k   386k  112 rs0  SEC 10:38:15
 
       localhost:27017    *45     6    *19     *0       0    14|0     6.1   33.8       0  5.8G 5.2G   0|3   2|0    2k   223k  112 rs0  SEC 10:38:16
mongodb-replica1:27017     34    *0     19     *0      36    18|0    22.0   50.5       0  2.2G 1.5G   0|0   1|1   50k    64k  107 rs0  PRI 10:38:16
mongodb-replica2:27017    *45     6    *19     *0       0    14|0     6.1   33.8       0  5.8G 5.2G   0|3   2|0    2k   223k  112 rs0  SEC 10:38:16

Please let me know if you need more information.

Thanks



 Comments   
Comment by Maziyar Panahi [ 31/Aug/15 ]

MongoDB CPU usage in MMS before and after update to 3.0.6.
https://jira.mongodb.org/secure/attachment/88236/Screen%20Shot%202015-08-31%20at%2022.28.04.png

Comment by Maziyar Panahi [ 31/Aug/15 ]

Hi Sam,

I checked this few times. Even without bulk inserts after awhile the CPU (user) usage would go up to 140-150% on both replica members. But no after last update it's been 24hrs which the usage is moderate around 30% on both. I will attach a screenshots before update and after latest update. (don't mind the time that the usage went down to 0%. We had a power outrage and has nothing to do with MongoDB)

I will let you know if it comes back but for now everything is ok and I think we can close this issue
Thanks again Sam.

Comment by Sam Kleinman (Inactive) [ 03/Aug/15 ]

Sorry for taking so long to get back to you. While we haven't yet been able to reproduce the behavior you see here, after reviewing this again, it seems plausible that some of the improvements in 3.0.5 may have addressed this issue, particularly the improvements to the performance of capped collections and the oplog. You can see the complete list of changes in 3.0.5 here and the WiredTiger specific changes here.

Could you see if this resolves your issues?

Regards,
sam

Comment by Maziyar Panahi [ 01/Jul/15 ]

Hi Sam,

grep -c "processor" /proc/cpuinfo = 10
Instances are VM on top of KVM with 10 cores each from a machine with 46 cores.

I did stop all the operations and this is the result.

                 insert query update delete getmore command % dirty % used flushes   vsize    res qr|qw ar|aw netIn netOut conn set repl     time
       localhost:27017     52    *0     13     *0      53    59|0     1.2   79.6       0    4.2G   3.4G   0|0   1|0   82k   114k   31 rs0  PRI 10:26:30
mongodb-replica1:27017     51    *0     13     *0      52    58|0     1.2   79.6       0    4.2G   3.4G   0|0   1|0   81k   114k   31 rs0  PRI 10:26:30
mongodb-replica2:27017    *61     3    *20     *0       0     6|0     0.1    0.1       0    1.1G 539.0M   0|0   1|0    1k   140k   83 rs0  SEC 10:26:30
 
       localhost:27017     *0    *0     *0     *0       0     7|0     1.3   79.6       0    4.2G   3.4G   0|0   1|0  639b   374k   31 rs0  PRI 10:26:31
mongodb-replica1:27017     *0    *0     *0     *0       0     7|0     1.3   79.6       0    4.2G   3.4G   0|0   1|0  639b   374k   31 rs0  PRI 10:26:31
mongodb-replica2:27017     *3    *0     *0     *0       0     6|0     0.1    0.1       0    1.0G 541.0M   0|0   1|0  497b   129k    7 rs0  SEC 10:26:31
 
       localhost:27017     *0    *0     *0     *0       0     4|0     1.3   79.6       1    4.2G   3.4G   0|0   1|0  291b    47k   31 rs0  PRI 10:26:32
mongodb-replica1:27017     *0    *0     *0     *0       0     4|0     1.3   79.6       1    4.2G   3.4G   0|0   1|0  291b    47k   31 rs0  PRI 10:26:32
mongodb-replica2:27017     *0    *0     *0     *0       0     6|0     0.1    0.1       0    1.0G 541.0M   0|0   1|0  487b    47k    7 rs0  SEC 10:26:32
 
       localhost:27017     *0    *0     *0     *0       0     7|0     1.3   79.6       0    4.2G   3.4G   0|0   2|0  580b    47k   31 rs0  PRI 10:26:33
mongodb-replica1:27017     *0    *0     *0     *0       0     6|0     1.3   79.6       0    4.2G   3.4G   0|0   2|0  501b    32k   31 rs0  PRI 10:26:33
mongodb-replica2:27017     *0    *0     *0     *0       0     5|0     0.1    0.1       0    1.0G 541.0M   0|0   1|0  434b    46k    7 rs0  SEC 10:26:33
 
       localhost:27017     *0    *0     *0     *0       0     5|0     1.3   79.6       0    4.2G   3.4G   0|0   1|0  403b    47k   31 rs0  PRI 10:26:34
mongodb-replica1:27017     *0    *0     *0     *0       0     6|0     1.3   79.6       0    4.2G   3.4G   0|0   2|0  403b    47k   31 rs0  PRI 10:26:34
mongodb-replica2:27017     *0    *0     *0     *0       0     5|0     0.1    0.1       0    1.0G 541.0M   0|0   1|0  433b    46k    7 rs0  SEC 10:26:34

0  0   3972 503712 319648 75442824    0    0     0    54  740 1108  6  1 94  0  0
 0  0   3972 502260 319648 75442920    0    0     0   919  836 1345  2  1 96  1  0
 0  0   3972 501820 319648 75443056    0    0     0     6  678  978  1  1 98  0  0
 0  0   3972 502108 319648 75443216    0    0     0     3  736 1016  1  1 98  0  0
 0  0   3972 503100 319648 75443400    0    0     0    83  708  990  2  1 98  0  0
 0  0   3972 502632 319648 75443568    0    0     0    16  757 1185  1  1 98  0  0
 0  0   3972 502696 319648 75443728    0    0     0   170  659  931  1  1 98  0  0
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0   3972 502696 319648 75443936    0    0     0     2  720 1326  1  1 98  0  0
 0  0   3972 502976 319648 75444176    0    0     0     2  874 1267  2  1 97  0  0
 0  0   3972 502912 319648 75444512    0    0     0     3 1050 1552  2  1 97  0  0
 2  0   3972 501620 319648 75444880    0    0     0     2 1310 2157  2  1 96  0  0
 0  0   3972 501936 319648 75445216    0    0     0   284 1177 1792  2  1 97  0  0
 1  0   3972 502124 319648 75445528    0    0     0    82  956 1383  2  1 97  0  0
 1  0   3972 502980 319648 75445624    0    0     0     3  685  860  6  1 93  0  0
 1  0   3972 505268 319648 75446712    0    0     0  1480  788  933 12  0 88  0  0
 0  0   3972 504280 319648 75446592    0    0     0   987  545  879  7  0 92  1  0
 0  0   3972 503996 319648 75446616    0    0     0     2  270  487  0  0 100  0  0

I also attached one screenshot from the time I stopped all the operations.

My question is, how come when you shutdown and start an instance it all goes away even though the operation load is the same as few seconds ago?

Update: After this I started replica set without mentioning the cache size. The primary is already using 32G cache. I will let you know if there was any change to this problem.

Best,
Maziyar

Comment by Maziyar Panahi [ 01/Jul/15 ]

When the CPU usage drops is the time I stopped all the operations.

Comment by Sam Kleinman (Inactive) [ 30/Jun/15 ]

Thanks for the additional information. WiredTiger is able to utilize more available CPU resources than MongoDB instances using the MMAPv1 engine, so higher CPU usage is expected. I want to make sure that we're not seeing a case where a system is doing no work (i.e. no operations.) and CPU usage stays consistently high. From the data you've provided it seems that the database is still responsive, and is continuing to do work while it's using CPU which might be expected, to some degree.

I want to see if we still see the effect with two additional modifications:

  1. If, after seeing the elevated CPU use, you stop all operations and see if there's any change in CPU use (if possible.)
  2. Perform the test operation again but change the w value in the bulk operation to w=1.

Could you also confirm the number of logical CPU cores (e.g. grep -c "processor" /proc/cpuinfo)?

Cheers,
sam

Comment by Maziyar Panahi [ 30/Jun/15 ]

Hi Sam,

Sorry for the delay I can't find the way to get notified by email when I am watching an issue.

The MMS account is mine and linked to the same email address as this one. This email address is also the MMS username and group owner: maziyar@live.co.uk and group name: zeus-iscpif

I had much larger cache which I was told to bring it down in this issue: https://jira.mongodb.org/browse/SERVER-18231

Here is how I do bulk inserts and it's not that often once in a while I do this and immediately the CPU user usage in MMS goes over the 100%:

var colsij = db.collection(key_sij);
var bulksij = colsij.initializeUnorderedBulkOp();
bulksij.insert(
{
key1: keywordsi,
key2: keywordsj,
sij: formularScoreSIJ,
oc: occurrenceScoreOfI,
co: cooccurrenceScore
});
I do this for about 40K and execute it like this:
bulksij.execute({ w: 0, wtimeout: 0 }, function(err, result) {
if(!err){
}else{
       console.log("err", err);
       }
});

Right now the CPU (user usage) is over 100% so I will put some stats here:

                  insert query update delete getmore command % dirty % used flushes vsize     res qr|qw ar|aw netIn netOut conn set repl     time
       localhost:27017    *39     6     *9     *0       0    13|0    23.6   46.8       0 10.0G    9.5G   0|3   1|0    2k   217k   89 rs0  SEC 11:14:55
mongodb-replica1:27017     42    *0     10     *0      41    15|0     8.4   58.3       0  2.9G    2.3G   0|1   1|0   51k    66k   87 rs0  PRI 11:14:55
mongodb-replica2:27017    *39     6     *9     *0       0    13|0    23.6   46.8       0 10.0G    9.5G   0|3   1|0    2k   201k   89 rs0  SEC 11:14:55
 
       localhost:27017    *61     6    *16     *0       0    10|0    19.3   46.8       0 10.0G    9.5G   0|1   2|0    2k   425k   89 rs0  SEC 11:14:56
mongodb-replica1:27017     62    *0     23     *0      60     5|0     7.9   58.3       0  2.9G    2.3G   0|1   1|0   79k    92k   87 rs0  PRI 11:14:56
mongodb-replica2:27017    *61     6    *16     *0       0     9|0    19.3   46.8       0 10.0G    9.5G   0|1   2|0    2k   425k   89 rs0  SEC 11:14:56
 
       localhost:27017    *42     3    *14     *0       0     8|0    14.2   46.8       0 10.0G    9.5G   0|3   2|0    1k   123k   89 rs0  SEC 11:14:57
mongodb-replica1:27017     54    *0     17     *0      51    14|0    12.4   58.3       0  2.9G    2.3G   0|1   1|0   77k   519k   87 rs0  PRI 11:14:57
mongodb-replica2:27017    *42     3    *14     *0       0     8|0    14.2   46.8       0 10.0G    9.5G   0|3   2|0    1k   123k   89 rs0  SEC 11:14:57
 
       localhost:27017    *67     6    *10     *0       0     8|0    12.8   46.8       0 10.0G    9.5G   0|3   1|0    2k   216k   89 rs0  SEC 11:14:58
mongodb-replica1:27017     43    *0     13     *0      47     6|0    12.5   58.3       0  2.9G    2.3G   0|1   1|0   57k    71k   87 rs0  PRI 11:14:58
mongodb-replica2:27017    *67     6    *10     *0       0     8|0    12.8   46.8       0 10.0G    9.5G   0|3   2|0    2k   216k   89 rs0  SEC 11:14:58
 
       localhost:27017    *52     6    *14     *0       0     5|0     9.4   46.8       0 10.0G    9.5G   0|3   2|0    2k   214k   89 rs0  SEC 11:14:59
mongodb-replica1:27017     50    *0     21     *0      53     8|0    12.8   58.3       0  2.9G    2.3G   0|1   1|0   77k    90k   87 rs0  PRI 11:14:59
mongodb-replica2:27017    *52     6    *14     *0       0     5|0     9.4   46.8       0 10.0G    9.5G   0|3   2|0    2k   214k   89 rs0  SEC 11:14:59
 
       localhost:27017    *41     8    *16     *0       0    10|0    10.7   46.3       0 10.0G    9.5G   0|1   2|0    1k   125k   89 rs0  SEC 11:15:00
mongodb-replica1:27017     71    *0     16     *0      70    11|0    12.8   58.3       0  2.9G    2.3G   0|1   1|0   85k   101k   87 rs0  PRI 11:15:00
mongodb-replica2:27017    *41     8    *16     *0       0    10|0    10.7   46.3       0 10.0G    9.5G   0|1   2|0    1k   125k   89 rs0  SEC 11:15:00
 
       localhost:27017    *94     5    *18     *0       0     7|0    11.4   46.3       0 10.0G    9.5G   0|3   1|0    3k   371k   89 rs0  SEC 11:15:01
mongodb-replica1:27017     74    *0     12     *0      66     7|0    13.1   58.3       0  2.9G    2.3G   0|1   1|0   79k    92k   87 rs0  PRI 11:15:01
mongodb-replica2:27017    *94     5    *18     *0       0     7|0    11.4   46.3       0 10.0G    9.5G   0|3   1|0    3k   371k   89 rs0  SEC 11:15:01
 
       localhost:27017    *35     6     *8     *0       0     7|0    11.7   46.3       0 10.0G    9.5G   0|3   1|0    2k   214k   89 rs0  SEC 11:15:02
mongodb-replica1:27017     67    *0     *0     *0      61    10|0    13.1   58.3       0  2.9G    2.3G   0|1   1|0   62k    76k   87 rs0  PRI 11:15:02
mongodb-replica2:27017    *35     6     *8     *0       0     7|0    11.7   46.3       0 10.0G    9.5G   0|3   1|0    2k   214k   89 rs0  SEC 11:15:02
 
       localhost:27017    *86     6     *6     *0       0    10|0    11.7   46.3       0 10.0G    9.5G   0|1   1|0    1k   124k   89 rs0  SEC 11:15:03
mongodb-replica1:27017     65    *0     15     *0      60     8|0    13.1   58.3       0  2.9G    2.3G   0|1   1|0   74k    88k   87 rs0  PRI 11:15:03
mongodb-replica2:27017    *86     6     *6     *0       0    10|0    11.7   46.3       0 10.0G    9.5G   0|1   2|0    1k   124k   89 rs0  SEC 11:15:03
 
       localhost:27017    *66     3     *9     *0       0    13|0    12.1   46.4       0 10.0G    9.5G   0|3   2|0    2k   426k   89 rs0  SEC 11:15:04
mongodb-replica1:27017     67    *0      4     *0      57    10|0    13.1   58.3       0  2.9G    2.3G   0|1   1|0   64k    81k   87 rs0  PRI 11:15:04
mongodb-replica2:27017    *66     3     *9     *0       0    13|0    12.1   46.4       0 10.0G    9.5G   0|3   2|0    2k   426k   89 rs0  SEC 11:15:04

vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0   3532 417732 317256 76759992    0    0     2    27    0    1  8  1 90  0  0
 1  1   3532 415548 317256 76761848    0    0     0  1488 1370 1605 18  2 78  2  0
 1  0   3532 415932 317256 76762192    0    0     0  1467 1093 1276 12  1 87  0  0
 1  0   3532 415380 317256 76762560    0    0     0    70 1129 1080 12  1 87  0  0
 1  0   3532 415224 317256 76762944    0    0     0    92 1111 1505 12  1 87  0  0
 1  0   3532 415208 317256 76763312    0    0     0    12 1075 1004 12  1 87  0  0
 2  0   3532 414472 317256 76763664    0    0     0   247 1077 1239 12  1 87  0  0
 1  0   3532 414388 317256 76764048    0    0     0     6 1105 1348 12  1 87  0  0
 2  0   3532 414460 317256 76764440    0    0     0   187 1317 2735 13  2 85  0  0
 1  0   3532 413916 317256 76764808    0    0     0   216 1235 1328 12  1 87  0  0
 1  0   3532 413592 317256 76765056    0    0     0    16  953 1024 11  1 88  0  0
 1  0   3532 413004 317256 76765328    0    0     0     6 1034  993 12  1 87  0  0
 2  0   3532 412708 317256 76765632    0    0     0     3  997 1262 11  1 88  0  0
 1  0   3532 413016 317256 76765888    0    0     0     3  971  941 11  1 87  0  0
 3  0   3532 411824 317256 76766272    0    0     0     4 1351 2176 20  2 78  0  0
 3  0   3532 412016 317256 76766760    0    0     0  1222 1529 1484 21  2 77  0  0
 2  0   3532 410808 317256 76768672    0    0     0  2281 1362 1451 18  2 79  1  0
 1  0   3532 438076 317256 76740800    0    0     0     4 1184 1073 12  2 87  0  0
 1  0   3532 438420 317256 76741184    0    0     0   120 1045  993 12  1 87  0  0
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0   3532 436916 317256 76741512    0    0     0    89 1302 1642 13  2 85  0  0
 1  0   3532 437244 317256 76741872    0    0     0     6 1040 1298 12  1 87  0  0
 1  0   3532 436856 317256 76742216    0    0     0   176 1074 1643 12  1 87  0  0
 1  0   3532 436616 317256 76742496    0    0     0     6 1024 1180 11  1 88  0  0
 1  0   3532 436712 317256 76742840    0    0     0     2  994 1086 12  1 87  0  0
 1  0   3532 436604 317256 76743104    0    0     0     3  938  897 11  1 88  0  0
 1  0   3532 435532 317256 76743408    0    0     0     4  973 1066 11  1 87  0  0
 1  0   3532 434944 317256 76744200    0    0     0    24  911  868 11  1 88  0  0

I also added few screenshots from MMS.
Please let me know if you need more information.

Many thanks,
Maziyar

Comment by Sam Kleinman (Inactive) [ 24/Jun/15 ]

The user from which you opened this ticket is not linked to an MMS account: could you either link this account to the relevant MMS group or provide the group name in this ticket so we can track this issue down.

The wired tiger cache size controls the full amount of cache size that the entire MongoDB instance has access. It looks like your data and index sizes are much larger than 2 gigabytes. Are you able to reproduce this issue with much smaller data sizes or a much larger cache size? Could you provide more information about the bulk operations that trigger this case. An example operation, including the write concern and the replication configuration, would be very useful.

I'm also interested in actual CPU utilization: When you say over 100%, do you mean that MongoDB is using more than 100% of a single logical CPU core? Is the database responsive during this time?

Thanks for the additional background.

Regards,
sam

Generated at Thu Feb 08 03:49:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.