[SERVER-47201] High CPU usage during insert Created: 31/Mar/20  Updated: 27/Oct/23  Resolved: 28/Apr/20

Status: Closed
Project: Core Server
Component/s: Write Ops
Affects Version/s: 4.2.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Eddy Li Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Participants:

 Description   

We have a 3 shards cluster, but as soon as we start to load data, the cpu usage goes up to 100%, no other operation goes on during that time, IO is very low, we have pretty much memory (60GB), we're unable to go live because of this issue, any one can help with this?

Here is out conf file:

  1. cat /data/mongodb/shard1/mongod.conf
    systemLog:
    destination: file
    logAppend: true
    logRotate: reopen
    path: /data/mongodb/shard1/log/mongod.log
    operationProfiling:
    slowOpThresholdMs: "10000"
    mode: "off"
    storage:
    dbPath: /data/mongodb/shard1/data
    journal:
    enabled: true
    commitIntervalMs: 500
    wiredTiger:
    engineConfig:
    cacheSizeGB: 60
    processManagement:
    fork: true
    pidFilePath: /data/mongodb/shard1/run/mongod.pid
    net:
    port: 27001
    bindIp: 0.0.0.0
    replication:
    replSetName: shard1
    sharding:
    clusterRole: shardsvr
    storage.wiredTiger.engineConfig.configString: "eviction_dirty_target=60,eviction_dirty_trigger=80,eviction=(threads_min=1,threads_max=2)"

Here is some mongostat info:

insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*0 *0 *0 *0 0 2|0 0.0% 53.7% 1 35.8G 33.8G 0|0 1|0 859b 9.18k 43 shard1 PRI Mar 28 00:57:34.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 778b 12.3k 43 shard1 PRI Mar 28 00:57:49.859
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 912b 16.3k 43 shard1 PRI Mar 28 00:58:04.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 1.03k 12.7k 43 shard1 PRI Mar 28 00:58:19.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 1 35.8G 33.8G 0|0 1|0 625b 9.09k 43 shard1 PRI Mar 28 00:58:34.859
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 932b 17.3k 43 shard1 PRI Mar 28 00:58:49.858
*0 *0 *0 *0 0 1|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 530b 8.68k 43 shard1 PRI Mar 28 00:59:04.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 1.08k 12.7k 43 shard1 PRI Mar 28 00:59:19.859
*0 *0 *0 *0 0 2|0 0.0% 53.7% 1 35.8G 33.8G 0|0 1|0 593b 13.9k 43 shard1 PRI Mar 28 00:59:34.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 702b 12.2k 43 shard1 PRI Mar 28 00:59:49.858
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*0 *0 *0 *0 0 1|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 636b 11.2k 43 shard1 PRI Mar 28 01:00:04.858
273 *0 *0 *0 8 9|0 0.2% 53.7% 0 35.8G 33.8G 0|0 1|0 168k 217k 46 shard1 PRI Mar 28 01:00:19.854
9696 *0 *0 *0 337 346|0 4.9% 54.1% 1 35.9G 34.0G 0|0 1|0 5.90m 7.17m 47 shard1 PRI Mar 28 01:00:34.860
10007 *0 *0 *0 322 314|0 10.1% 54.5% 0 36.2G 34.3G 0|0 1|0 6.04m 7.37m 49 shard1 PRI Mar 28 01:00:49.858
10489 *0 *0 *0 350 362|0 15.1% 55.0% 0 36.4G 34.6G 0|0 1|1 6.37m 7.75m 49 shard1 PRI Mar 28 01:01:04.856
10776 *0 *0 *0 348 359|0 20.3% 55.4% 0 36.7G 34.8G 0|0 1|0 6.52m 7.94m 49 shard1 PRI Mar 28 01:01:19.857
7538 *0 *0 *0 205 196|0 23.9% 55.7% 0 36.9G 35.0G 0|0 1|0 4.51m 5.49m 49 shard1 PRI Mar 28 01:01:34.868
4015 *0 *0 *0 36 2|0 25.7% 55.9% 0 37.0G 35.1G 0|0 1|0 2.30m 2.83m 48 shard1 PRI Mar 28 01:01:49.866
3500 *0 *0 *0 34 2|0 27.4% 56.0% 0 37.1G 35.2G 0|0 1|1 2.00m 2.46m 48 shard1 PRI Mar 28 01:02:04.854
3846 *0 *0 *0 35 2|0 29.1% 56.2% 0 37.2G 35.3G 0|0 1|0 2.20m 2.71m 49 shard1 PRI Mar 28 01:02:19.875
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
3445 *0 *0 *0 54 2|0 30.6% 56.3% 0 37.3G 35.4G 0|0 1|0 1.98m 2.45m 48 shard1 PRI Mar 28 01:02:34.858
3867 *0 *0 *0 37 2|0 32.4% 56.5% 0 37.4G 35.5G 0|0 1|0 2.21m 2.71m 48 shard1 PRI Mar 28 01:02:49.857
3795 *0 *0 *0 35 2|0 34.2% 56.6% 0 37.5G 35.6G 0|0 1|0 2.17m 2.67m 48 shard1 PRI Mar 28 01:03:04.880
3882 *0 *0 *0 33 2|0 36.1% 56.8% 0 37.6G 35.7G 0|0 1|0 2.22m 2.74m 47 shard1 PRI Mar 28 01:03:19.868
3784 *0 *0 *0 35 2|0 37.9% 56.9% 0 37.7G 35.8G 0|0 1|1 2.17m 2.65m 47 shard1 PRI Mar 28 01:03:34.864
3791 *0 *0 *0 35 2|0 39.6% 57.1% 0 37.8G 35.9G 0|0 1|0 2.16m 2.66m 47 shard1 PRI Mar 28 01:03:49.875
4070 *0 *0 *0 28 1|0 41.5% 57.3% 0 37.9G 36.0G 0|0 1|0 2.33m 2.86m 47 shard1 PRI Mar 28 01:04:04.869
3992 *0 *0 *0 1 2|0 43.3% 57.4% 0 38.0G 36.1G 0|0 1|0 2.28m 1.22m 47 shard1 PRI Mar 28 01:04:19.866
3679 *0 *0 *0 0 2|0 45.0% 57.6% 0 38.1G 36.2G 0|0 1|0 2.09m 1.15m 48 shard1 PRI Mar 28 01:04:34.854
3857 *0 *0 *0 0 2|0 46.9% 57.7% 0 38.2G 36.3G 0|0 1|0 2.19m 35.6k 48 shard1 PRI Mar 28 01:04:49.857
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
2088 *0 *0 *0 0 2|0 47.8% 57.8% 0 38.3G 36.4G 0|0 1|0 1.19m 1.14m 48 shard1 PRI Mar 28 01:05:04.860
*0 *0 *0 *0 0 2|0 47.7% 57.8% 0 38.3G 36.4G 0|0 1|0 737b 12.1k 48 shard1 PRI Mar 28 01:05:19.876

 

Here is some iostat info:

avg-cpu: %user %nice %system %iowait %steal %idle
99.32 0.00 0.66 0.00 0.00 0.02

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 13.73 499.00 275.53 22978.40 6031.20 74.91 2.68 3.46 1.00 7.92 0.10 7.67
dm-0 0.00 0.00 0.00 0.13 0.00 0.80 12.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 39.20 0.00 198.93 10.15 0.14 3.67 0.00 3.67 0.07 0.26
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 499.00 249.87 22978.40 5830.67 76.94 2.58 3.44 1.00 8.32 0.10 7.53

 



 Comments   
Comment by Dmitry Agranat [ 28/Apr/20 ]

Hi lsh2273@hotmail.com,

Hi,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Dmitry Agranat [ 06/Apr/20 ]

Yes lsh2273@hotmail.com, there were several known issues in 4.2.0 version. Please rerun this test with the MongoDB 4.2.5 version after removing any non-default settings.

Comment by Eddy Li [ 02/Apr/20 ]

Is there any known issue with 4.2.0 version? I changed the default WT setting because the performance was even worse with the default setting.

Thanks ,
Eddy

Comment by Dmitry Agranat [ 01/Apr/20 ]

Hi lsh2273@hotmail.com, thank you for uploading the requested information.

I couple of observations:

  • Under "Affected versions" you've marked 4.2.2 as the version you are using but from the provided data I see this is 4.2.0
  • I also see you are using non-default wt setting, specifically: configString:eviction_dirty_target=60,eviction_dirty_trigger=80,eviction=(threads_min=1,threads_max=2).

Could you please rerun this test with the MongoDB 4.2.5 version after removing any non-default settings? This will help us to create a reliable baseline.

Thanks,
Dima

Comment by Eddy Li [ 31/Mar/20 ]

Hi Dima,

Thanks for looking into this issue, I've uploaded the log.

Thanks,
Eddy

Comment by Dmitry Agranat [ 31/Mar/20 ]

Hi lsh2273@hotmail.com,

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Generated at Thu Feb 08 05:13:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.