[SERVER-44834] mongod block some time, dirty is very high, and do not evit data to disk, io is 0%, my disk is nvme-SSD(this disk io performance is very good) Created: 26/Nov/19  Updated: 27/Oct/23  Resolved: 02/Dec/19

Status: Closed
Project: Core Server
Component/s: Performance, Replication
Affects Version/s: 3.6.14, 3.6.15
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: y yz Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 11111.png     Text File db.serverstats2.txt     Text File db.serverstatus.wiredTiger.txt     PNG File dirty-high-disk-block.png     PNG File disk-performance.png     Text File iostat.log     Text File replSetGetConfig.txt    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

dirty is very high, but do not evit to disk, disk io is 0%

 

iostat log as following:

11/22/2019 12:17:21 AM
avg-cpu: %user %nice %system %iowait %steal %idle
3.43 0.00 1.01 0.02 0.00 95.55

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dfa 0.00 0.00 15.00 546.00 208.00 2588.00 9.97 0.00 0.01 0.00 0.01 0.01 0.30

11/22/2019 12:17:22 AM
avg-cpu: %user %nice %system %iowait %steal %idle
2.90 0.00 0.56 0.00 0.00 96.54

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.00 0.00 2.00 0.00 12.00 12.00 0.00 0.00 0.00 0.00 0.00 0.00
dfa 0.00 0.00 0.00 485.00 0.00 2276.00 9.39 0.00 0.00 0.00 0.00 0.00 0.00

11/22/2019 12:17:23 AM
avg-cpu: %user %nice %system %iowait %steal %idle
3.51 0.00 1.07 0.02 0.00 95.41

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dfa 0.00 0.00 0.00 509.00 0.00 2468.00 9.70 0.00 0.00 0.00 0.00 0.00 0.10

11/22/2019 12:17:24 AM
avg-cpu: %user %nice %system %iowait %steal %idle
4.20 0.00 1.79 0.00 0.00 94.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dfa 0.00 0.00 0.00 531.00 0.00 2472.00 9.31 0.00 0.01 0.00 0.01 0.01 0.30

11/22/2019 12:17:25 AM
avg-cpu: %user %nice %system %iowait %steal %idle
4.19 0.00 2.03 0.00 0.00 93.78

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dfa 0.00 0.00 1.98 474.26 11.88 2249.50 9.50 0.00 0.00 0.50 0.00 0.00 0.20

11/22/2019 12:17:26 AM
avg-cpu: %user %nice %system %iowait %steal %idle
1.86 0.00 1.80 0.05 0.00 96.30

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dfa 0.00 0.00 0.00 109721.00 0.00 1053840.00 19.21 25.67 0.23 0.00 0.23 0.01 62.00

11/22/2019 12:17:27 AM
avg-cpu: %user %nice %system %iowait %steal %idle
1.43 0.00 0.56 0.00 0.00 98.01

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dfa 0.00 0.00 0.00 436.00 0.00 2076.00 9.52 0.00 0.00 0.00 0.00 0.00 0.00

11/22/2019 12:17:28 AM
avg-cpu: %user %nice %system %iowait %steal %idle
1.82 0.00 0.91 0.00 0.00 97.27

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dfa 0.00 0.00 0.00 474.00 0.00 2312.00 9.76 0.00 0.00 0.00 0.00 0.00 0.20

Participants:

 Description   

mongod block some time, dirty is very high, and do not evit data to disk, io is 0%

dirty is very high, but do not evit to disk, disk io is 0%

mongostat info as following:

insert query update delete getmore command dirty used flushes vsize  res qrw arw net_in net_out conn                time
10.37.162.223:10000 *0 *0 *0 *0 0 11|0 21.7% 95.0% 0 117G 55.9G n/a 0|0 1|1 1.57k 69.1k 1823 shard_4FC5EC6E PRI Nov 22 12:17:51.561
10.37.162.223:10000 *0 *0 *0 *0 0 6|0 21.5% 95.0% 0 117G 55.9G n/a 0|0 1|1 1.09k 74.3k 1823 shard_4FC5EC6E PRI Nov 22 12:17:52.505
10.37.162.223:10000 *0 *0 *0 *0 0 5|0 21.3% 95.0% 0 117G 55.9G n/a 0|0 1|1 1.03k 69.9k 1823 shard_4FC5EC6E PRI Nov 22 12:17:53.506
10.37.162.223:10000 *0 *0 *0 *0 0 7|0 29.7% 95.0% 0 117G 55.9G n/a 0|0 2|1 1.25k 70.4k 1823 shard_4FC5EC6E PRI Nov 22 12:17:54.506
10.37.162.223:10000 *0 *0 *0 *0 0 8|0 48.5% 95.0% 0 117G 55.9G n/a 0|0 2|1 1.59k 71.9k 1823 shard_4FC5EC6E PRI Nov 22 12:17:55.505
10.37.162.223:10000 *0 *0 3 *0 0 5|0 58.1% 95.0% 0 117G 55.9G n/a 0|0 2|23 26.4k 69.9k 1823 shard_4FC5EC6E PRI Nov 22 12:17:56.507
10.37.162.223:10000 *0 *0 22 *0 0 60|0 57.5% 95.0% 0 117G 55.9G n/a 0|411 2|128 80.3k 39.2k 1944 shard_4FC5EC6E PRI Nov 22 12:18:04.579
10.37.162.223:10000 *0 *0 561 *0 0 1509|0 56.3% 94.1% 0 118G 56.0G n/a 0|691 2|128 607k 3.66m 1944 shard_4FC5EC6E PRI Nov 22 12:18:05.679
10.37.162.223:10000 *0 *0 *0 *0 0 10|0 55.7% 94.0% 0 118G 56.0G n/a 0|691 1|128 1.31k 80.0k 1944 shard_4FC5EC6E PRI Nov 22 12:18:06.559
10.37.162.223:10000 *0 *0 *0 *0 0 11|0 55.0% 93.9% 0 118G 56.0G n/a 0|691 1|128 2.56k 78.7k 1944 shard_4FC5EC6E PRI Nov 22 12:18:07.509
10.37.162.223:10000 *0 *0 *0 *0 0 8|0 54.7% 93.8% 0 118G 56.0G n/a 0|691 1|128 2.56k 71.1k 1944 shard_4FC5EC6E PRI Nov 22 12:18:08.507
10.37.162.223:10000 *0 *0 *0 *0 0 8|0 54.2% 93.3% 0 118G 56.0G n/a 0|691 1|128 2.56k 71.1k 1944 shard_4FC5EC6E PRI Nov 22 12:18:09.507
10.37.162.223:10000 *0 *0 *0 *0 0 10|0 54.2% 93.3% 0 118G 56.0G n/a 0|691 1|128 2.67k 73.0k 1944 shard_4FC5EC6E PRI Nov 22 12:18:10.506
10.37.162.223:10000 *0 *0 *0 *0 0 26|0 54.2% 93.3% 0 118G 56.0G n/a 0|691 2|128 7.85k 88.5k 1944 shard_4FC5EC6E PRI Nov 22 12:18:11.507
10.37.162.223:10000 *0 *0 *0 *0 0 9|0 54.2% 93.3% 0 118G 56.0G n/a 0|691 1|128 2.84k 72.0k 1944 shard_4FC5EC6E PRI Nov 22 12:18:12.506
10.37.162.223:10000 *0 *0 *0 *0 0 7|0 54.2% 93.2% 0 118G 56.0G n/a 0|691 1|128 2.55k 70.9k 1944 shard_4FC5EC6E PRI Nov 22 12:18:13.508
10.37.162.223:10000 *0 *0 *0 *0 0 8|0 54.3% 93.2% 0 118G 56.0G n/a 0|691 1|128 2.56k 71.1k 1944 shard_4FC5EC6E PRI Nov 22 12:18:14.506
10.37.162.223:10000 *0 *0 *0 *0 0 6|0 54.3% 93.2% 0 118G 55.9G n/a 0|691 1|128 1.03k 70.1k 1944 shard_4FC5EC6E PRI Nov 22 12:18:15.506
10.37.162.223:10000 *0 *0 *0 *0 0 8|0 54.0% 92.9% 0 118G 55.9G n/a 0|691 1|128 2.77k 71.3k 1944 shard_4FC5EC6E PRI Nov 22 12:18:16.507
10.37.162.223:10000 *0 *0 *0 *0 0 7|0 52.9% 91.9% 0 118G 55.9G n/a 0|691 1|128 2.55k 71.0k 1944 shard_4FC5EC6E PRI Nov 22 12:18:17.507
10.37.162.223:10000 *0 *0 *0 *0 0 8|0 51.7% 91.1% 0 118G 55.9G n/a 0|691 1|128 1.59k 71.9k 1943 shard_4FC5EC6E PRI Nov 22 12:18:18.507
10.37.162.223:10000 *0 *0 *0 *0 0 10|0 50.6% 90.2% 0 118G 55.9G n/a 0|691 1|128 1.58k 71.6k 1943 shard_4FC5EC6E PRI Nov 22 12:18:19.517
10.37.162.223:10000 *0 *0 *0 *0 0 6|0 49.5% 89.6% 0 118G 55.9G n/a 0|691 1|128 1.04k 70.8k 1943 shard_4FC5EC6E PRI Nov 22 12:18:20.506
10.37.162.223:10000 *0 *0 *0 *0 0 11|0 48.8% 89.3% 0 118G 55.9G n/a 0|691 1|128 1.60k 70.5k 1943 shard_4FC5EC6E PRI Nov 22 12:18:21.537
10.37.162.223:10000 *0 *0 *0 *0 0 6|0 48.4% 89.2% 0 118G 55.9G n/a 0|691 1|128 1.06k 72.2k 1943 shard_4FC5EC6E PRI Nov 22 12:18:22.507
10.37.162.223:10000 *0 *0 *0 *0 0 5|0 47.9% 89.1% 0 118G 55.9G n/a 0|691 1|128 1.03k 70.0k 1943 shard_4FC5EC6E PRI Nov 22 12:18:23.507
10.37.162.223:10000 *0 *0 8 *0 0 27|0 47.3% 88.9% 0 118G 55.9G n/a 0|700 1|128 10.7k 76.0k 1951 shard_4FC5EC6E PRI Nov 22 12:18:24.606
10.37.162.223:10000 *0 *0 93 *0 0 165|0 46.7% 88.9% 0 118G 55.9G n/a 0|843 2|128 91.7k 138k 2034 shard_4FC5EC6E PRI Nov 22 12:18:26.143
10.37.162.223:10000 *0 *0 110 *0 0 151|0 46.3% 88.9% 0 118G 55.9G n/a 0|915 2|128 83.4k 156k 2034 shard_4FC5EC6E PRI Nov 22 12:18:26.795
10.37.162.223:10000 *0 *0 *0 *0 0 21|0 46.0% 88.9% 0 118G 55.9G n/a 0|915 1|128 1.73k 111k 2034 shard_4FC5EC6E PRI Nov 22 12:18:27.509
10.37.162.223:10000 *0 *0 *0 *0 0 14|0 45.6% 88.9% 0 118G 55.9G n/a 0|915 2|128 1.50k 77.8k 2034 shard_4FC5EC6E PRI Nov 22 12:18:28.508
10.37.162.223:10000 *0 *0 *0 *0 0 6|0 45.1% 88.9% 0 118G 55.9G n/a 0|915 2|128 1.03k 70.1k 2034 shard_4FC5EC6E PRI Nov 22 12:18:29.506
10.37.162.223:10000 *0 *0 *0 *0 0 5|0 44.7% 88.9% 0 118G 55.9G n/a 0|915 2|128 1.03k 70.0k 2034 shard_4FC5EC6E PRI Nov 22 12:18:30.507
10.37.162.223:10000 *0 *0 *0 *0 0 6|0 44.2% 88.8% 0 118G 55.9G n/a 0|915 1|128 1.03k 70.1k 2034 shard_4FC5EC6E PRI Nov 22 12:18:31.507
10.37.162.223:10000 *0 *0 *0 *0 0 4|0 43.1% 88.1% 0 118G 55.9G n/a 0|915 1|128 1.36k 69.1k 2034 shard_4FC5EC6E PRI Nov 22 12:18:32.508
10.37.162.223:10000 *0 *0 *0 *0 0 6|0 42.5% 87.9% 0 118G 55.9G n/a 0|915 1|128 1.03k 70.1k 2034 shard_4FC5EC6E PRI Nov 22 12:18:33.507
10.37.162.223:10000 *0 *0 *0 *0 0 6|0 41.4% 87.2% 0 118G 55.9G n/a 0|915 1|128 1.03k 70.1k 2034 shard_4FC5EC6E PRI Nov 22 12:18:34.507
10.37.162.223:10000 *0 *0 *0 *0 0 233|0 40.9% 87.0% 0 118G 55.9G n/a 0|915 1|128 14.2k 288k 1660 shard_4FC5EC6E PRI Nov 22 12:18:35.508
10.37.162.223:10000 *0 *0 *0 *0 0 153|0 40.3% 86.9% 0 118G 55.9G n/a 0|915 1|128 14.2k 206k 1634 shard_4FC5EC6E PRI Nov 22 12:18:36.528
10.37.162.223:10000 *0 *0 *0 *0 0 14|0 39.8% 86.8% 0 118G 55.9G n/a 0|915 1|128 2.67k 73.2k 1634 shard_4FC5EC6E PRI Nov 22 12:18:37.605
10.37.162.223:10000 *0 *0 *0 *0 0 25|0 39.3% 86.7% 0 118G 55.9G n/a 0|915 1|128 2.24k 95.7k 1634 shard_4FC5EC6E PRI Nov 22 12:18:38.507
10.37.162.223:10000 *0 *0 *0 *0 0 16|0 38.6% 86.4% 0 118G 55.9G n/a 0|915 1|128 1.67k 80.5k 1634 shard_4FC5EC6E PRI Nov 22 12:18:39.508
10.37.162.223:10000 *0 *0 *0 *0 0 29|0 38.1% 86.3% 0 118G 55.9G n/a 0|915 1|128 2.37k 92.2k 1634 shard_4FC5EC6E PRI Nov 22 12:18:40.507
10.37.162.223:10000 *0 *0 *0 *0 0 30|0 37.7% 86.3% 0 118G 55.9G n/a 0|915 1|128 6.68k 93.2k 1634 shard_4FC5EC6E PRI Nov 22 12:18:41.507
10.37.162.223:10000 *0 *0 *0 *0 0 13|0 37.3% 86.3% 0 118G 55.9G n/a 0|915 1|128 1.66k 76.8k 1634 shard_4FC5EC6E PRI Nov 22 12:18:42.506
10.37.162.223:10000 *0 *0 *0 *0 0 7|0 36.8% 86.3% 0 118G 55.9G n/a 0|915 1|128 1.09k 71.0k 1634 shard_4FC5EC6E PRI Nov 22 12:18:43.506
10.37.162.223:10000 *0 *0 *0 *0 0 7|0 36.3% 86.3% 0 118G 55.9G n/a 0|915 1|128 2.55k 71.0k 1634 shard_4FC5EC6E PRI Nov 22 12:18:44.507
10.37.162.223:10000 *0 *0 *0 *0 0 7|0 35.9% 86.3% 0 118G 55.9G n/a 0|915 1|128 2.55k 71.0k 1634 shard_4FC5EC6E PRI Nov 22 12:18:45.507
10.37.162.223:10000 *0 *0 *0 *0 0 8|0 35.4% 86.3% 0 118G 55.9G n/a 0|915 1|128 2.56k 71.1k 1634 shard_4FC5EC6E PRI Nov 22 12:18:46.507
10.37.162.223:10000 *0 *0 *0 *0 0 8|0 34.8% 86.1% 0 118G 55.9G n/a 0|915 1|128 2.77k 71.3k 1634 shard_4FC5EC6E PRI Nov 22 12:18:47.507
10.37.162.223:10000 *0 *0 31 *0 0 135|0 34.3% 86.0% 0 118G 55.9G n/a 0|954 1|128 40.5k 150k 1660 shard_4FC5EC6E PRI Nov 22 12:18:48.708



 Comments   
Comment by y yz [ 03/Dec/19 ]

@Dmitry Agranat
thanks very much.

Comment by Dmitry Agranat [ 02/Dec/19 ]

Hi 1147952115@qq.com,

the master disk performance is very good, why the dirty-percentage is become higher and higher, and all the client connect blocked

There are 2 reasons for this:

  • Secondaries do not keep up replicating data from the primary (which has much faster disks)
  • When one of the secondaries becomes unavailable, even for a few seconds, we must keep historical data on the primary. This creates cache pressure (dirty percentage becomes high) which in turn impacts client's performance.

How can I solve this problem,What should I do

There are a few things which might help:

  • Make sure to follow our Production Notes
  • Make sure HW between all members is identical
  • Consider using PSS instead of PSSA

I am going to close this ticket now but if you still experience issues after implementing all the above recommendations, please open a new one and we'll be happy to have a look.

Regards,
Dima

Comment by y yz [ 02/Dec/19 ]

@Dmitry Agranat
thanks for you reply, but I have one problem, the master disk performance is very good, why the dirty-percentage is become higher and higher, and all the client connect blocked?

another question, How can I solve this problem, What should I do

thanks again.

Comment by Dmitry Agranat [ 02/Dec/19 ]

Hi 1147952115@qq.com,

As of the 3.6 release MongoDB enabled readConcern majority support by default, which requires WiredTiger to retain more historical versions of data (history). That history needs to be kept either in the WiredTiger cache or the cache overflow table which might be slow. Having reviewed different periods of time (based on the uploaded data), events where cache pressure is present were related to times when one of the members was unavailable.

Your current configuration is PSSA, making the majority of 3. When even 1 out of 2 secondary member is unavailable, you loose the majority and we start to accumulate history (creating cache pressure). With PSS configuration, you will have a majority of 2 and loosing (or having connection issues) one of the secondaries would not create the same situation because you would still have the majority is still 2.

Regarding the spikes of i/o activity under the write-heavy workload, this is expected. We persist data to disk via checkpoints which by default run every 60 seconds. Given the fact that we've seen cases of writing ~2.7GB of data to disk, this can certainly be impactful. There are different tuning approaches to make this spiky behavior perform more smoothly (which include both OS and MongoDB tuning) but this is out of scope for the SERVER project. If you need further assistance with performance tuning, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Lastly, even though we do not have data for either of the secondary members, being configured differently from the primary (different i/o characteristics) makes them more vulnerable in terms of replication. You said secondaries are limited to 5k IOPS but your primary spikes with up to 185k IOPS. This makes secondaries, and specifically batch replication, as a potential bottleneck.

Thanks,
Dima

Comment by y yz [ 01/Dec/19 ]

@Dmitry Agranat

I have a question, when Write-traffic is very high, why write-IO is 0% for a long time, and then suddenly it's 100% for a long time, and so on。 when write-IO is 100%,The client connect will block。

Whether write-IO can be averaged to each different point in time, This may improve performance

Looking forward to your reply, thanks

Comment by y yz [ 01/Dec/19 ]

replSetGetConfig.txt

the rease is as following:
1. The business scenario is that the traffic is almost all write traffic,There are high concurrent write flow shocks。in high concurrency I found secondary Pull the data(getmore oplog) use more read IO, so I adjust replWriterThreadCount to 14, this can Reduce the read IO。
2. I adjust cacheSizeGB, because I find The delay time wobbles very seriously. after adjust cacheSizeGB, The delay time effect is better
3. dfa is a single SSD disk.

In addition,
The master and slave nodes use different types of machines, The primary node disk-io performs better than the slave node disk-io, the primary iops is 30000/s, the secondary iops is 5000/s.

thanks

Comment by Dmitry Agranat [ 01/Dec/19 ]

Thanks 1147952115@qq.com, could you also provide the output of replSetGetConfig command from shard_F0A9938E? I think I know what's going on but first I'll need to understand your replica set configuration.

A few more clarifying questions about your current configuration:

  • What is the reason you've changed the replWriterThreadCount to 14?
  • What is the reason you've changed the cacheSizeGB to 50?
  • What is dfa disk, is this a single SSD disk, a LUN partition or something else?

Thanks,
Dima

Comment by y yz [ 29/Nov/19 ]

@Dmitry Agranat
the problem appeared at another shard, I get all the log, I have send to you, thanks .


db.serverstats2.txt

if there is a need for other slave info(mongodb.log or diagnostic.data), please tell me. I have anly send the master diagnostic.data and mongod.log to you

Comment by y yz [ 29/Nov/19 ]

RAID:cache:NRWTD|access:RW|size:223.0GB|state:Optl|type:RAID1|

db.serverstatus.wiredTiger.txt

Comment by y yz [ 29/Nov/19 ]

@Dmitry Agranat
Dmitry Agranat, thank you for you reply

all shard master use the same storage, Where is just one mongod instance at the physical machine。but secondary mongod-instance use other type mathine,The master and slave nodes use different types of machines, The primary node disk-io performs better than the slave node disk-io, the primary iops is 30000/s, the secondary iops is 5000/s.

There's only so much diagnostic.data

[root@bjht12275 ~]# ps -ef | grep mongod
root 319934 318700 0 10:00 pts/1 00:00:00 grep --color mongod
root 520535 1 99 Nov18 ? 46-22:02:45 /usr/local/mongodb3.6.14/bin/mongod -f /home/service/var/data/mongodb/10000/mongodb.conf
[root@bjht12275 ~]#
[root@bjht12275 ~]#
[root@bjht12275 ~]#

[root@bjht12275 ~]#
[root@bjht12275 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 51474912 12798384 36038704 27% /
devtmpfs 98123472 0 98123472 0% /dev
tmpfs 98134180 4 98134176 1% /dev/shm
tmpfs 98134180 183308 97950872 1% /run
tmpfs 98134180 0 98134180 0% /sys/fs/cgroup
/dev/sda3 178423644 224536 169112644 1% /home
tmpfs 19626840 0 19626840 0% /run/user/0
/dev/dfa 6200593304 285911884 5602165036 5% /home/service/var/data

Comment by Dmitry Agranat [ 28/Nov/19 ]

Hi 1147952115@qq.com,

The latest data you've uploaded does not show the nvme disk any more, specifically, we can only see the root device under sda. In addition, the latest data does not cover the reported issue on Nov 22nd, around 12:17 AM UTC. The latest diagnostic.data only starts at Nov 23rd, 12:58 AM UTC.

In oder to be able to help, please provide/clarify:

  • Are all shards share the same storage? If yes, how many mongod processes in total share the same storage?
  • Is there any other disk attached to this server apart sda?
  • Provide description about your storage, as detailed as possible.

Once we'll understand the storage layout, we might need to recollect the diagnostic.data.

Thanks,
Dima

Comment by y yz [ 28/Nov/19 ]

@Dmitry Agranat
Dmitry Agranat

sorry, I may have send the wrong diagnose data, I send again with the same Physical machine diagnose data belong to the problem shard。

Please confirm whether the diagnose data is correct this time

Sorry again

Comment by Dmitry Agranat [ 27/Nov/19 ]

Hi 1147952115@qq.com,

Just to reiterate my last comment. In order to be able to diagnose the reported observation, we'll need the diagnostic.data from a server which had this issue. Having data from other nodes, which do not experience these symptoms, would not help to progress this case.

It was not clear from your comments if all shards share the same storage, please elaborate.

Comment by y yz [ 27/Nov/19 ]

When mongod go wrong, there have too much slow log ,the slow log time is very large

Comment by y yz [ 27/Nov/19 ]

this cluster have 11 shards,Sometimes there's a lot of traffic

Comment by y yz [ 27/Nov/19 ]

@Dmitry Agranat
Dmitry Agranat

this problem appears randomly in different shard, to deal with this problem, I changed the Physical machine, but problem not resolved。
At first I thought it was a hardware problem, So I changed the Physical machine, but you can Analysis of the log with the same time point.

this problem repeated in different places

Comment by Dmitry Agranat [ 26/Nov/19 ]

Hi 1147952115@qq.com,

The mongostat output you've provided in your initial comment belongs to shard_4FC5EC6E server, the diagnostic.data you've uploaded is from shard_110AFE67 server.

Please upload the diagnostic.data from shard_4FC5EC6E server.

Are all shards share the same storage? If yes, how many mongod processes in total share the same storage?

Thanks,
Dima

Comment by y yz [ 26/Nov/19 ]

@Dmitry Agranat
Dmitry Agranat

I have send diagnostic.data to you, but mongodb.log is too large,I do not send. thanks
if you resolve this proble, please tell or email me, thanks very much.
This problem seriously affects our business,thanks again.

Comment by Dmitry Agranat [ 26/Nov/19 ]

Hi 1147952115@qq.com,

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Comment by y yz [ 26/Nov/19 ]

fdisk info:

Disk /dev/sda: 240.1 GB, 240057409536 bytes
255 heads, 63 sectors/track, 29185 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00034df4

Device Boot Start End Blocks Id System
/dev/sda1 * 1 5222 41943040 83 Linux
/dev/sda2 5222 10183 39845888 83 Linux
/dev/sda3 10183 10183 1024 83 Linux
/dev/sda4 10183 29186 152639488 5 Extended
/dev/sda5 10183 29186 152638464 83 Linux

WARNING: GPT (GUID Partition Table) detected on '/dev/nvme0n1'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/nvme0n1: 6401.3 GB, 6401252745216 bytes
255 heads, 63 sectors/track, 778241 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/nvme0n1p1 1 267350 2147483647+ ee GPT

Comment by y yz [ 26/Nov/19 ]

@Mirko Bonadei
why dirty is very high for several hours, and I think wiredtiger not evict dirty page to disk,it is hanged

Comment by y yz [ 26/Nov/19 ]

Linux bjht12438 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@carl.champain @redbeard0531
can you give me some help

Generated at Thu Feb 08 05:07:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.