[SERVER-44834] mongod block some time, dirty is very high, and do not evit data to disk, io is 0%, my disk is nvme-SSD(this disk io performance is very good) Created: 26/Nov/19 Updated: 27/Oct/23 Resolved: 02/Dec/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Replication |
| Affects Version/s: | 3.6.14, 3.6.15 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | y yz | Assignee: | Dmitry Agranat |
| Resolution: | Community Answered | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Steps To Reproduce: | dirty is very high, but do not evit to disk, disk io is 0%
iostat log as following: 11/22/2019 12:17:21 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 11/22/2019 12:17:22 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 11/22/2019 12:17:23 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 11/22/2019 12:17:24 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 11/22/2019 12:17:25 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 11/22/2019 12:17:26 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 11/22/2019 12:17:27 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 11/22/2019 12:17:28 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util |
| Participants: |
| Description |
|
mongod block some time, dirty is very high, and do not evit data to disk, io is 0% dirty is very high, but do not evit to disk, disk io is 0% mongostat info as following:
|
| Comments |
| Comment by y yz [ 03/Dec/19 ] |
|
@Dmitry Agranat |
| Comment by Dmitry Agranat [ 02/Dec/19 ] |
There are 2 reasons for this:
There are a few things which might help:
I am going to close this ticket now but if you still experience issues after implementing all the above recommendations, please open a new one and we'll be happy to have a look. Regards, |
| Comment by y yz [ 02/Dec/19 ] |
|
@Dmitry Agranat another question, How can I solve this problem, What should I do thanks again. |
| Comment by Dmitry Agranat [ 02/Dec/19 ] |
|
As of the 3.6 release MongoDB enabled readConcern majority support by default, which requires WiredTiger to retain more historical versions of data (history). That history needs to be kept either in the WiredTiger cache or the cache overflow table which might be slow. Having reviewed different periods of time (based on the uploaded data), events where cache pressure is present were related to times when one of the members was unavailable. Your current configuration is PSSA, making the majority of 3. When even 1 out of 2 secondary member is unavailable, you loose the majority and we start to accumulate history (creating cache pressure). With PSS configuration, you will have a majority of 2 and loosing (or having connection issues) one of the secondaries would not create the same situation because you would still have the majority is still 2. Regarding the spikes of i/o activity under the write-heavy workload, this is expected. We persist data to disk via checkpoints which by default run every 60 seconds. Given the fact that we've seen cases of writing ~2.7GB of data to disk, this can certainly be impactful. There are different tuning approaches to make this spiky behavior perform more smoothly (which include both OS and MongoDB tuning) but this is out of scope for the SERVER project. If you need further assistance with performance tuning, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag. Lastly, even though we do not have data for either of the secondary members, being configured differently from the primary (different i/o characteristics) makes them more vulnerable in terms of replication. You said secondaries are limited to 5k IOPS but your primary spikes with up to 185k IOPS. This makes secondaries, and specifically batch replication, as a potential bottleneck. Thanks, |
| Comment by y yz [ 01/Dec/19 ] |
|
@Dmitry Agranat I have a question, when Write-traffic is very high, why write-IO is 0% for a long time, and then suddenly it's 100% for a long time, and so on。 when write-IO is 100%,The client connect will block。 Whether write-IO can be averaged to each different point in time, This may improve performance Looking forward to your reply, thanks |
| Comment by y yz [ 01/Dec/19 ] |
|
the rease is as following: In addition, thanks |
| Comment by Dmitry Agranat [ 01/Dec/19 ] |
|
Thanks 1147952115@qq.com, could you also provide the output of replSetGetConfig command from shard_F0A9938E? I think I know what's going on but first I'll need to understand your replica set configuration. A few more clarifying questions about your current configuration:
Thanks, |
| Comment by y yz [ 29/Nov/19 ] |
|
@Dmitry Agranat if there is a need for other slave info(mongodb.log or diagnostic.data), please tell me. I have anly send the master diagnostic.data and mongod.log to you |
| Comment by y yz [ 29/Nov/19 ] |
|
RAID:cache:NRWTD|access:RW|size:223.0GB|state:Optl|type:RAID1| |
| Comment by y yz [ 29/Nov/19 ] |
|
@Dmitry Agranat all shard master use the same storage, Where is just one mongod instance at the physical machine。but secondary mongod-instance use other type mathine,The master and slave nodes use different types of machines, The primary node disk-io performs better than the slave node disk-io, the primary iops is 30000/s, the secondary iops is 5000/s. There's only so much diagnostic.data [root@bjht12275 ~]# ps -ef | grep mongod [root@bjht12275 ~]# |
| Comment by Dmitry Agranat [ 28/Nov/19 ] |
|
The latest data you've uploaded does not show the nvme disk any more, specifically, we can only see the root device under sda. In addition, the latest data does not cover the reported issue on Nov 22nd, around 12:17 AM UTC. The latest diagnostic.data only starts at Nov 23rd, 12:58 AM UTC. In oder to be able to help, please provide/clarify:
Once we'll understand the storage layout, we might need to recollect the diagnostic.data. Thanks, |
| Comment by y yz [ 28/Nov/19 ] |
|
@Dmitry Agranat sorry, I may have send the wrong diagnose data, I send again with the same Physical machine diagnose data belong to the problem shard。 Please confirm whether the diagnose data is correct this time Sorry again |
| Comment by Dmitry Agranat [ 27/Nov/19 ] |
|
Just to reiterate my last comment. In order to be able to diagnose the reported observation, we'll need the diagnostic.data from a server which had this issue. Having data from other nodes, which do not experience these symptoms, would not help to progress this case. It was not clear from your comments if all shards share the same storage, please elaborate. |
| Comment by y yz [ 27/Nov/19 ] |
|
When mongod go wrong, there have too much slow log ,the slow log time is very large |
| Comment by y yz [ 27/Nov/19 ] |
|
this cluster have 11 shards,Sometimes there's a lot of traffic |
| Comment by y yz [ 27/Nov/19 ] |
|
@Dmitry Agranat this problem appears randomly in different shard, to deal with this problem, I changed the Physical machine, but problem not resolved。 this problem repeated in different places |
| Comment by Dmitry Agranat [ 26/Nov/19 ] |
|
The mongostat output you've provided in your initial comment belongs to shard_4FC5EC6E server, the diagnostic.data you've uploaded is from shard_110AFE67 server. Please upload the diagnostic.data from shard_4FC5EC6E server. Are all shards share the same storage? If yes, how many mongod processes in total share the same storage? Thanks, |
| Comment by y yz [ 26/Nov/19 ] |
|
@Dmitry Agranat I have send diagnostic.data to you, but mongodb.log is too large,I do not send. thanks |
| Comment by Dmitry Agranat [ 26/Nov/19 ] |
|
Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Thanks, |
| Comment by y yz [ 26/Nov/19 ] |
|
fdisk info: Disk /dev/sda: 240.1 GB, 240057409536 bytes Device Boot Start End Blocks Id System WARNING: GPT (GUID Partition Table) detected on '/dev/nvme0n1'! The util fdisk doesn't support GPT. Use GNU Parted. Disk /dev/nvme0n1: 6401.3 GB, 6401252745216 bytes Device Boot Start End Blocks Id System |
| Comment by y yz [ 26/Nov/19 ] |
|
@Mirko Bonadei |
| Comment by y yz [ 26/Nov/19 ] |
|
Linux bjht12438 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux @carl.champain @redbeard0531 |