[SERVER-47650] mongostat queue qrw 600+ arw fixed 128 Created: 18/Apr/20  Updated: 27/Oct/23  Resolved: 27/May/20

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 4.0.16
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: jing xu Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File mongstat.jpg     JPEG File mongtop.jpg     PNG File myShard_3.png     JPEG File nopswithmoredirty.jpg     JPEG File ops.jpg     JPEG File quque1.jpg     PNG File readconcern.png     JPEG File shard1mongstat.jpg     File shard2mongo.tar.gz     JPEG File shard2topandstat.jpg     JPEG File shardops.jpg     JPEG File shardshowlog.jpg     PNG File srvdb1.yto.cloud.png     PNG File upload ftdc.png    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

mongo cluster has two shard using hash policy.i test on mongo 4.0.16.

 

mongostat on shard 1 show qrw+arw less queue.but mongostat on shard arw fixed 128 queue,qarw 600+,mongotop for shard collection,400-600s write,but in mongo log,sql execute time is 100-500ms.so my cluster ops is 5000/s,my machine is 56c +512g +12tb ssd 10 with sas interface.

mongo cluster:

p  s a (shard1)

a  s  p(shard2)

one machine:shard 1 primary + shard2 a

one mchine:shard 1 second + shard2 second

one machine:shard2 primary + shard 1 a

[shard1]

insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
 1298 1296 1295 *0 497 598|0 4.0% 80.0% 0 235G 224G 0|1 1|0 5.23m 5.04m 70 myShard_1 PRI Apr 18 17:42:01.888
 1306 1308 1309 *0 499 598|0 4.0% 80.0% 0 235G 224G 0|0 1|0 5.28m 5.02m 70 myShard_1 PRI Apr 18 17:42:02.888
 1264 1263 1262 *0 522 617|0 4.0% 80.0% 0 235G 224G 0|0 1|0 5.15m 4.92m 70 myShard_1 PRI Apr 18 17:42:03.887
 1383 1382 1382 *0 507 614|0 4.1% 80.0% 0 235G 224G 0|0 1|0 5.56m 5.29m 70 myShard_1 PRI Apr 18 17:42:04.887
 1316 1316 1318 *0 517 623|0 4.1% 80.0% 0 235G 224G 0|0 2|0 5.33m 5.09m 70 myShard_1 PRI Apr 18 17:42:05.887

[shard2]

insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
 1169 1188 1189 *0 0 7|0 15.5% 68.4% 0 242G 206G 0|665 5|128 4.18m 2.28m 926 myShard_2 PRI Apr 18 17:29:24.155
 1075 1074 1072 *0 0 4|0 15.5% 68.4% 0 242G 206G 0|658 3|128 3.84m 2.09m 926 myShard_2 PRI Apr 18 17:29:25.153
 1115 1111 1112 *0 0 6|0 15.5% 68.4% 0 242G 206G 0|666 3|128 3.92m 2.18m 926 myShard_2 PRI Apr 18 17:29:26.155
 1198 1207 1206 *0 0 5|0 15.5% 68.4% 0 242G 206G 0|662 4|128 4.31m 2.33m 925 myShard_2 PRI Apr 18 17:29:27.153
 1182 1173 1173 *0 0 2|0 15.5% 68.4% 0 242G 206G 0|661 5|128 4.15m 2.26m 926 myShard_2 PRI Apr 18 17:29:28.153
 1212 1219 1218 *0 0 7|0 15.5% 68.4% 0 242G 206G 0|662 4|128 4.34m 2.35m 926 myShard_2 PRI Apr 18 17:29:29.154
 1174 1190 1188 *0 0 5|0 15.5% 68.4% 0 242G 207G 0|658 5|128 4.23m 2.29m 926 myShard_2 PRI Apr 18 17:29:30.154
 1247 1252 1251 *0 0 13|0 15.5% 68.4% 0 242G 207G 0|652 5|128 4.43m 2.45m 926 myShard_2 PRI Apr 18 17:29:31.155
 1234 1229 1221 *0 0 45|0 15.5% 68.4% 0 242G 207G 0|659 5|128 4.35m 2.58m 926 myShard_2 PRI Apr 18 17:29:32.154
 1214 1221 1221 *0 0 24|0 15.5% 68.4% 0 242G 207G 0|660 5|128 4.31m 2.42m 926 myShard_2 PRI Apr 18 17:29:33.154
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
 1236 1225 1224 *0 0 15|0 15.5% 68.4% 0 242G 207G 0|657 5|128 4.36m 2.37m 920 myShard_2 PRI Apr 18 17:29:34.154
 1197 1186 1188 *0 0 13|0 15.5% 68.4% 0 242G 207G 0|661 5|128 4.25m 2.31m 920 myShard_2 PRI Apr 18 17:29:35.154
 1259 1297 1296 *0 0 16|0 15.5% 68.4% 0 242G 207G 0|657 6|128 4.55m 2.54m 920 myShard_2 PRI Apr 18 17:29:36.155



 Comments   
Comment by Dmitry Agranat [ 27/May/20 ]

Thanks 601290552@qq.com, as I no longer see queued operations or active read/write operations being stuck at 128, I will go ahead and close this ticket. If you need further assistance with tuning, I encourage you to ask our community by posting on the MongoDB Community Forums or on Stack Overflow with the mongodb tag.

Comment by jing xu [ 25/May/20 ]

i upload files to https://10gen-httpsupload.s3.amazonaws.com/upload_forms/d9451bbe-fdc6-46dd-8f9b-097a8024f9e6.html

Comment by Dmitry Agranat [ 26/Apr/20 ]

Hi 601290552@qq.com, yes I have reviewed the uploaded data and it was very helpful.

Given the pattern and shape of the operations over time, could you please clarify if this workload is being generated by some load generator / load scripts or is it a real production workload?

Now, with profiling disabled, we have a better visibility into the next bottleneck - WiredTiger eviction server.

To be more specific, eviction server struggles in finding pages to evict because the current workload simply overwhelms eviction (or in other words, the cache is being held in a state that is not suitable for the running workload and for eviction server to find pages to evict). I believe that the issue is with how the current write operations are being executed, because there are so many of them it is like a death by a thousand paper cuts. These many short write operations are trashing the WiredTiger cache so fast (all cache is being currently trashed within 10 minutes) that eventually, we start to having trouble identifying what data needs to stay in the cache and what data needs to be evicted (determining how many of the pages we keep in the queue and discard the rest).

One thing that you might consider implementing immediately to improve performance is using [bulk() write operations\https://docs.mongodb.com/manual/reference/method/Bulk/index.html] for which the WiredTiger storage engine is optimized. Once this is implemented, we can review the fresh data to determine the impact.

Thanks,
Dima

Comment by jing xu [ 26/Apr/20 ]

Dmitry Agranat;
is it has result for it?

Comment by jing xu [ 24/Apr/20 ]

hi:
i deal with the following step:

  • Enable readConcern:majority
  • Leave profiling off
  • Let it run for a day

now i upload srvdb1ytodiagnostic.tar.gz to https://10gen-httpsupload.s3.amazonaws.com/upload_forms/d9451bbe-fdc6-46dd-8f9b-097a8024f9e6.html

Comment by jing xu [ 22/Apr/20 ]

Dmitry Agranat;
it is ok.i will deal with it.

Comment by Dmitry Agranat [ 22/Apr/20 ]

601290552@qq.com, let's take a step back. As I've mentioned earlier, you do not need to disable readConcern:majority for PSS.

Next steps:

  • Enable readConcern:majority
  • Leave profiling off
  • Let it run for a day

After a day, upload the fresh data from srvdb1.yto.cloud for us to review (no need to upload data from the rest of the servers).

Comment by jing xu [ 22/Apr/20 ]

now.i disable profiling and disable readConcern:majority for pss.
but when collect size more than cachesize,mongo performance is slow.

13027 12969  12972     *0     805  1045|0 20.1% 74.6%       0   101G 80.6G    0|55  18|128  27.1m   54.1m  330 myReplicaSet10003  PRI Apr 22 16:51:23.174
  7222  7246   7310     *0     687   888|0 20.1% 74.7%       0   101G 80.6G   0|168   6|128  15.5m   30.4m  330 myReplicaSet10003  PRI Apr 22 16:51:24.175
  6158  6146   6150     *0     615   751|0 20.1% 74.7%       0   101G 80.6G    0|93   9|128  13.1m   25.7m  330 myReplicaSet10003  PRI Apr 22 16:51:25.174
 16235 16235  16246     *0     607   607|0 20.1% 74.6%       0   101G 80.6G    8|95   4|128  33.2m   67.3m  330 myReplicaSet10003  PRI Apr 22 16:51:26.175
  7237  7259   7235     *0     658   834|0 20.1% 74.7%       0   101G 80.6G   0|153  22|128  15.4m   30.3m  330 myReplicaSet10003  PRI Apr 22 16:51:27.173
  6952  6922   6948     *0     669   871|0 20.1% 74.7%       0   101G 80.6G   0|169   6|128  14.8m   29.3m  330 myReplicaSet10003  PRI Apr 22 16:51:28.175
insert query update delete getmore command dirty  used flushes  vsize   res     qrw     arw net_in net_out conn               set repl                time
 11225 11189  11087     *0     924  1110|0 20.1% 74.7%       0   101G 80.6G    0|17    9|17  23.5m   46.4m  330 myReplicaSet10003  PRI Apr 22 16:51:29.175
  6028  6095   6174     *0     587   732|0 20.1% 74.7%       0   101G 80.6G   0|126   3|128  13.1m   26.1m  330 myReplicaSet10003  PRI Apr 22 16:51:30.174
  6757  6741   6767     *0     686   877|0 20.1% 74.7%       0   101G 80.6G   0|163   6|125  14.4m   28.2m  330 myReplicaSet10003  PRI Apr 22 16:51:31.174
 10319 10313  10271     *0     825  1011|0 20.1% 74.7%       0   101G 80.6G    0|87   5|128  21.7m   43.2m  330 myReplicaSet10003  PRI Apr 22 16:51:32.175
 10705 10747  10788     *0     826  1057|0 20.1% 74.7%       0   101G 80.6G   0|172   1|128  22.7m   45.1m  330 myReplicaSet10003  PRI Apr 22 16:51:33.173
  5583  5599   5481     *0     580   735|0 20.1% 74.8%       0   101G 80.6G   1|172   1|128  11.9m   23.3m  330 myReplicaSet10003  PRI Apr 22 16:51:34.174
  5939  5901   6032     *0     731   939|0 20.1% 74.8%       0   101G 80.6G   0|172   1|128  13.0m   25.4m  330 myReplicaSet10003  PRI Apr 22 16:51:35.173
 12048 12033  12020     *0     827   939|0 20.1% 74.8%       0   101G 80.6G   0|170   3|128  25.1m   50.0m  330 myReplicaSet10003  PRI Apr 22 16:51:36.177
  4139  4144   4142     *0     417   508|0 20.1% 74.8%       0   101G 80.7G   0|172   3|128  8.84m   17.5m  330 myReplicaSet10003  PRI Apr 22 16:51:37.173
 23168 23139  23059     *0     776   856|0 20.1% 74.7%       0   101G 80.7G     0|0     3|2  47.2m   95.5m  330 myReplicaSet10003  PRI Apr 22 16:51:38.175
insert query update delete getmore command dirty  used flushes  vsize   res     qrw     arw net_in net_out conn               set repl                time
 24463 24478  24469     *0     866   990|0 20.1% 74.5%       0   101G 80.7G    0|10   2|128  50.2m    101m  330 myReplicaSet10003  PRI Apr 22 16:51:39.174
 25418 25464  25514     *0     832   955|0 20.1% 74.4%       0   101G 80.7G    0|76   8|128  52.1m    105m  330 myReplicaSet10003  PRI Apr 22 16:51:40.179
 25251 25156  25174     *0     769   858|0 20.1% 74.3%       0   101G 80.7G   0|103   9|128  51.5m    104m  330 myReplicaSet10003  PRI Apr 22 16:51:41.177
 21833 21878  21773     *0     660   832|0 20.1% 74.2%       0   101G 80.7G     0|0   6|111  44.6m   90.4m  330 myReplicaSet10003  PRI Apr 22 16:51:42.178
 22951 23009  23112     *0     761   812|0 20.1% 74.1%       0   101G 80.7G   3|141   2|128  47.1m   95.0m  330 myReplicaSet10003  PRI Apr 22 16:51:43.179
 24354 24242  24248     *0     782   865|0 20.1% 74.0%       0   101G 80.7G    0|46   6|125  49.7m    100m  330 myReplicaSet10003  PRI Apr 22 16:51:44.178
 24045 24092  24029     *0     811   989|0 20.1% 73.9%       0   101G 80.7G     0|5     3|4  49.2m   99.7m  330 myReplicaSet10003  PRI Apr 22 16:51:45.175
 23638 23691  23754     *0     789   946|0 20.1% 73.8%       0   101G 80.7G   0|112  11|127  48.6m   97.4m  330 myReplicaSet10003  PRI Apr 22 16:51:46.178
 25115 25078  24981     *0     815   881|0 20.1% 73.7%       0   101G 80.7G     0|0    7|51  51.2m    104m  330 myReplicaSet10003  PRI Apr 22 16:51:47.174
 26252 26254  26275     *0     828   906|0 20.1% 73.6%       0   101G 80.7G     0|0   14|90  53.7m    108m  330 myReplicaSet10003  PRI Apr 22 16:51:48.178
insert query update delete getmore command dirty  used flushes  vsize   res     qrw     arw net_in net_out conn               set repl                time
 25256 25184  25229     *0     830   891|0 20.1% 73.5%       0   101G 80.7G     0|0     5|8  51.6m    104m  330 myReplicaSet10003  PRI Apr 22 16:51:49.174
 24670 24747  24712     *0     859  1044|0 20.1% 73.4%       0   101G 80.7G   3|111   2|127  50.7m    102m  330 myReplicaSet10003  PRI Apr 22 16:51:50.176
 21882 21854  21766     *0     762   853|0 20.1% 73.4%       0   101G 80.7G     0|2     7|1  44.7m   90.4m  330 myReplicaSet10003  PRI Apr 22 16:51:51.173
 20324 20338  20328     *0     693   857|0 20.1% 73.3%       0   101G 80.7G    0|57    6|10  41.7m   83.6m  330 myReplicaSet10003  PRI Apr 22 16:51:52.175
 20351 20419  20523     *0     697   835|0 20.1% 73.2%       0   101G 80.7G   0|114   5|128  41.9m   84.3m  330 myReplicaSet10003  PRI Apr 22 16:51:53.173
 14293 14246  14168     *0     477   574|0 20.1% 73.2%       0   101G 80.7G    0|22  19|128  29.1m   58.2m  330 myReplicaSet10003  PRI Apr 22 16:51:54.175
 18930 18904  19000     *0     506   546|0 20.1% 73.1%       0   101G 80.7G    0|88   1|128  38.6m   78.9m  330 myReplicaSet10003  PRI Apr 22 16:51:55.175
 20093 20202  20237     *0     698   797|0 20.2% 73.1%       0   101G 80.7G   0|133   4|128  41.3m   83.2m  330 myReplicaSet10003  PRI Apr 22 16:51:56.175
 19442 19380  19381     *0     604   712|0 20.2% 73.0%       0   101G 80.7G   0|148   5|128  39.7m   79.8m  330 myReplicaSet10003  PRI Apr 22 16:51:57.174
 19581 19642  19480     *0     658   787|0 20.2% 72.9%       0   101G 80.7G   0|108  36|128  40.0m   81.3m  330 myReplicaSet10003  PRI Apr 22 16:51:58.176
insert query update delete getmore command dirty  used flushes  vsize   res     qrw     arw net_in net_out conn               set repl                time
 19595 19500  19506     *0     687   810|0 20.2% 72.9%       0   101G 80.7G    23|0   5|128  40.0m   80.8m  330 myReplicaSet10003  PRI Apr 22 16:51:59.178
 19993 20051  20235     *0     745   876|0 20.2% 72.8%       0   101G 80.7G   0|154   6|128  41.3m   82.9m  330 myReplicaSet10003  PRI Apr 22 16:52:00.175
 19920 19886  19841     *0     742   891|0 20.2% 72.8%       0   101G 80.7G   0|138   3|128  40.8m   82.6m  330 myReplicaSet10003  PRI Apr 22 16:52:01.174
 19765 19791  19827     *0     755   908|0 20.2% 72.7%       0   101G 80.7G   0|121  15|128  40.6m   81.0m  330 myReplicaSet10003  PRI Apr 22 16:52:02.175
 19753 19731  19666     *0     685   803|0 20.2% 72.7%       0   101G 80.7G    7|63   9|128  40.4m   82.1m  331 myReplicaSet10003  PRI Apr 22 16:52:03.175
 18967 18983  18996     *0     775   918|0 20.1% 72.6%       0   101G 80.7G    5|98   5|128  39.1m   78.7m  331 myReplicaSet10003  PRI Apr 22 16:52:04.177
 18620 18567  18601     *0     685   874|0 20.1% 72.6%       0   101G 80.7G   1|166   2|128  38.2m   77.0m  331 myReplicaSet10003  PRI Apr 22 16:52:05.174
 17357 17397  17283     *0     691   812|0 20.2% 72.5%       0   101G 80.7G   0|142  33|128  35.6m   71.8m  331 myReplicaSet10003  PRI Apr 22 16:52:06.175
  3953  3957   3944     *0     150   175|0 20.1% 72.5%       0   101G 80.7G     4|0    4|30  8.03m   16.6m  331 myReplicaSet10003  PRI Apr 22 16:52:07.174
 18969 18956  19018     *0     736   874|0 20.2% 72.5%       0   101G 80.7G    3|50   3|121  39.0m   77.7m  331 myReplicaSet10003  PRI Apr 22 16:52:08.175
insert query update delete getmore command dirty  used flushes  vsize   res     qrw     arw net_in net_out conn               set repl                time
 18841 18845  18802     *0     805   935|0 20.1% 72.4%       0   101G 80.7G    0|11  68|128  38.8m   78.4m  331 myReplicaSet10003  PRI Apr 22 16:52:09.173

Comment by jing xu [ 22/Apr/20 ]

hello:
Was profiling disabled?
yes,i disable it.

Comment by Dmitry Agranat [ 22/Apr/20 ]

Hi 601290552@qq.com,

Did you have a chance to review my last comment? Was profiling disabled? Just posting the last part for your convenience:

Please disable profiling, let the workload run for a day or two and upload the fresh data from srvdb1.yto.cloud for us to review (no need to upload data from the rest of the servers). This might not solve all the observed issues but hopefully will remove one of the main bottlenecks.

Thanks,
Dima

Comment by jing xu [ 22/Apr/20 ]

hello?
why change psa to pss enable readConcern:majority,so that queue so long.

Comment by jing xu [ 21/Apr/20 ]

insert query update delete getmore command dirty  used flushes vsize   res    qrw    arw net_in net_out conn               set repl                time
    *0    *0     *0     *0       0    12|0 22.9% 63.8%       0  107G 90.3G  0|439 42|128  2.35k    826k  634 myReplicaSet10003  PRI Apr 21 18:57:04.627
    *0    *0     *0     *0       0    12|0 22.9% 63.7%       0  107G 90.3G  0|439 42|128  2.34k    825k  634 myReplicaSet10003  PRI Apr 21 18:57:05.628
    *0    *0     *0     *0       0    10|0 22.9% 63.7%       0  107G 90.3G  0|439 42|128  2.12k    825k  634 myReplicaSet10003  PRI Apr 21 18:57:06.627
    *0    *0     *0     *0       0    10|0 22.9% 63.7%       0  107G 90.3G  0|439 42|128  2.27k    824k  634 myReplicaSet10003  PRI Apr 21 18:57:07.627
    *0    *0     *0     *0       0    10|0 22.9% 63.7%       0  107G 90.3G  0|439 42|128  2.27k    824k  634 myReplicaSet10003  PRI Apr 21 18:57:08.628
    *0    *0     *0     *0       0    17|0 22.9% 63.7%       0  107G 90.3G  0|439 43|128  3.36k    830k  635 myReplicaSet10003  PRI Apr 21 18:57:09.627
    *0    *0     *0     *0       0    10|0 22.9% 63.7%       0  107G 90.3G  0|439 43|128  2.27k    825k  632 myReplicaSet10003  PRI Apr 21 18:57:10.627
    *0     1     *0     *0       0    20|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  4.31k    867k  634 myReplicaSet10003  PRI Apr 21 18:57:11.627
    *0    *0     *0     *0       0    12|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  2.60k    828k  635 myReplicaSet10003  PRI Apr 21 18:57:12.627
    *0    *0     *0     *0       0    16|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  3.39k    866k  635 myReplicaSet10003  PRI Apr 21 18:57:13.627
insert query update delete getmore command dirty  used flushes vsize   res    qrw    arw net_in net_out conn               set repl                time
    *0    *0     *0     *0       0    11|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  2.35k    827k  635 myReplicaSet10003  PRI Apr 21 18:57:14.627
    *0    *0     *0     *0       0    12|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  2.27k    828k  635 myReplicaSet10003  PRI Apr 21 18:57:15.627
    *0    *0     *0     *0       0     9|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  2.11k    825k  635 myReplicaSet10003  PRI Apr 21 18:57:16.628
    *0    *0     *0     *0       0    11|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  2.27k    827k  635 myReplicaSet10003  PRI Apr 21 18:57:17.627
    *0    *0     *0     *0       0    10|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  2.27k    826k  635 myReplicaSet10003  PRI Apr 21 18:57:18.627
    *0    *0     *0     *0       0    12|0 22.9% 63.7%       0  107G 90.3G  0|439 44|128  2.43k    828k  635 myReplicaSet10003  PRI Apr 21 18:57:19.626
    *0    *0     *0     *0       0    10|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.27k    826k  635 myReplicaSet10003  PRI Apr 21 18:57:20.627
    *0    *0     *0     *0       0    11|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.19k    827k  635 myReplicaSet10003  PRI Apr 21 18:57:21.627
    *0    *0     *0     *0       0    11|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.26k    827k  635 myReplicaSet10003  PRI Apr 21 18:57:22.628
    *0    *0     *0     *0       0    16|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  3.39k    866k  635 myReplicaSet10003  PRI Apr 21 18:57:23.627
insert query update delete getmore command dirty  used flushes vsize   res    qrw    arw net_in net_out conn               set repl                time
    *0    *0     *0     *0       0    12|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.40k    828k  635 myReplicaSet10003  PRI Apr 21 18:57:24.627
    *0    *0     *0     *0       0    13|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.32k    828k  635 myReplicaSet10003  PRI Apr 21 18:57:25.627
    *0    *0     *0     *0       0     9|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.11k    825k  635 myReplicaSet10003  PRI Apr 21 18:57:26.628
    *0    *0     *0     *0       0    11|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.27k    827k  635 myReplicaSet10003  PRI Apr 21 18:57:27.627
    *0    *0     *0     *0       0    11|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.27k    827k  635 myReplicaSet10003  PRI Apr 21 18:57:28.627
    *0    *0     *0     *0       0    12|0 22.8% 63.7%       0  107G 90.3G  0|439 44|128  2.43k    828k  635 myReplicaSet10003  PRI Apr 21 18:57:29.626
    *0    *0     *0     *0       0    10|0 22.8% 63.6%       0  107G 90.3G  0|439 44|128  2.27k    826k  635 myReplicaSet10003  PRI Apr 21 18:57:30.628
    *0    *0     *0     *0       0    11|0 22.8% 63.6%       0  107G 90.3G  0|439 44|128  2.19k    827k  635 myReplicaSet10003  PRI Apr 21 18:57:31.628
    *0    *0     *0     *0       0    15|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.96k    831k  635 myReplicaSet10003  PRI Apr 21 18:57:32.627
    *0    *0     *0     *0       0    26|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  5.77k    917k  636 myReplicaSet10003  PRI Apr 21 18:57:33.627
insert query update delete getmore command dirty  used flushes vsize   res    qrw    arw net_in net_out conn               set repl                time
    *0    *0     *0     *0       0    12|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.35k    829k  636 myReplicaSet10003  PRI Apr 21 18:57:34.626
    *0    *0     *0     *0       0    11|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.26k    827k  636 myReplicaSet10003  PRI Apr 21 18:57:35.627
    *0    *0     *0     *0       0    10|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.11k    827k  636 myReplicaSet10003  PRI Apr 21 18:57:36.627
    *0    *0     *0     *0       0    10|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.27k    827k  636 myReplicaSet10003  PRI Apr 21 18:57:37.628
    *0    *0     *0     *0       0    11|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.27k    828k  636 myReplicaSet10003  PRI Apr 21 18:57:38.627
    *0    *0     *0     *0       0    12|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.48k    828k  636 myReplicaSet10003  PRI Apr 21 18:57:39.627
    *0    *0     *0     *0       0    10|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.27k    827k  636 myReplicaSet10003  PRI Apr 21 18:57:40.627
    *0    *0     *0     *0       0    11|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.24k    828k  636 myReplicaSet10003  PRI Apr 21 18:57:41.628
    *0    *0     *0     *0       0    12|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.27k    829k  636 myReplicaSet10003  PRI Apr 21 18:57:42.627
    *0    *0     *0     *0       0    15|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  3.38k    866k  636 myReplicaSet10003  PRI Apr 21 18:57:43.628
insert query update delete getmore command dirty  used flushes vsize   res    qrw    arw net_in net_out conn               set repl                time
    *0    *0     *0     *0       0    12|0 22.8% 63.6%       0  107G 90.3G  0|439 45|128  2.35k    829k  636 myReplicaSet10003  PRI Apr 21 18:57:44.626
    *0    *0     *0     *0       0    11|0 22.7% 63.6%       0  107G 90.3G  0|439 45|128  2.26k    827k  636 myReplicaSet10003  PRI Apr 21 18:57:45.628
    *0    *0     *0     *0       0     9|0 22.7% 63.6%       0  107G 90.3G  0|439 45|128  2.11k    827k  636 myReplicaSet10003  PRI Apr 21 18:57:46.628
    *0    *0     *0     *0       0    10|0 22.7% 63.6%       1  107G 90.3G  0|439 45|128  2.27k    827k  636 myReplicaSet10003  PRI Apr 21 18:57:47.628
    *0    19      5     *0       0    24|0 22.7% 63.5%       0  107G 90.3G  0|471  6|128  36.9k   1.01m  632 myReplicaSet10003  PRI Apr 21 18:57:48.628
    *0    *0     *0     *0       0    13|0 22.7% 63.5%       0  107G 90.3G  0|471  7|128  2.65k    889k  631 myReplicaSet10003  PRI Apr 21 18:57:49.627
    *0    *0     *0     *0       0    10|0 22.7% 63.5%       0  107G 90.3G  0|471  7|128  2.27k    815k  631 myReplicaSet10003  PRI Apr 21 18:57:50.628
    *0    *0     *0     *0       0    11|0 22.7% 63.5%       0  107G 90.3G  0|471  7|128  2.19k    817k  631 myReplicaSet10003  PRI Apr 21 18:57:51.627
    *0    *0     *0     *0       0    11|0 22.7% 63.5%       0  107G 90.3G  0|471  7|128  2.27k    817k  631 myReplicaSet10003  PRI Apr 21 18:57:52.628

Comment by jing xu [ 21/Apr/20 ]

now i convert psa to pss.the queue :

[root@srvdb1 mongod10003]# /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-4.0.16/bin/mongostat --host 127.0.0.1:10003
insert query update delete getmore command dirty  used flushes vsize   res   qrw   arw net_in net_out conn               set repl                time
 25856 25826  25891     *0     157   304|0 20.3% 65.6%       0  107G 91.2G 0|342 5|128  52.2m   74.5m  632 myReplicaSet10003  PRI Apr 21 18:38:38.481
 23212 23123  22885     *0     191   392|0 20.0% 65.3%       0  107G 91.2G   0|0 12|30  46.5m   63.7m  632 myReplicaSet10003  PRI Apr 21 18:38:39.476
 35826 35804  35800     *0     247   448|0 20.2% 65.5%       0  107G 91.2G   0|0   1|0  72.3m   98.6m  632 myReplicaSet10003  PRI Apr 21 18:38:40.475
 26407 26515  26700     *0     254   482|0 20.3% 65.6%       0  107G 91.2G 3|219 5|128  53.8m   72.3m  632 myReplicaSet10003  PRI Apr 21 18:38:41.476
 28443 28442  28285     *0     184   243|0 20.2% 65.6%       0  107G 91.2G   0|0  3|14  57.1m   78.9m  632 myReplicaSet10003  PRI Apr 21 18:38:42.475
 30288 30404  30463     *0     322   626|0 20.3% 65.6%       0  107G 91.2G 0|295 6|128  61.6m   83.2m  632 myReplicaSet10003  PRI Apr 21 18:38:43.478
 30380 30270  30158     *0     289   525|0 20.3% 65.6%       0  107G 91.2G   4|0  3|71  61.1m   83.6m  632 myReplicaSet10003  PRI Apr 21 18:38:44.475
 16017 16107  16311     *0     188   362|0 19.8% 65.4%       0  107G 91.2G 0|372 2|128  32.8m   43.6m  632 myReplicaSet10003  PRI Apr 21 18:38:45.475
 32029 31990  33204     *0     253   489|0 20.0% 65.3%       0  107G 91.2G 0|344 4|126  64.7m   89.4m  632 myReplicaSet10003  PRI Apr 21 18:38:46.476
 35638 35403  35323     *0     356   654|0 20.2% 65.6%       0  107G 91.2G   0|0 11|43  71.7m   98.1m  632 myReplicaSet10003  PRI Apr 21 18:38:47.477
insert query update delete getmore command dirty  used flushes vsize   res   qrw   arw net_in net_out conn               set repl                time
 24123 24311  24392     *0     311   640|0 20.3% 65.7%       0  107G 91.2G 0|261 3|128  49.4m   66.8m  632 myReplicaSet10003  PRI Apr 21 18:38:48.475
 26948 27041  27038     *0     250   406|0 20.2% 65.6%       0  107G 91.2G 0|326 3|128  54.6m   74.6m  632 myReplicaSet10003  PRI Apr 21 18:38:49.476
 30815 30843  30787     *0     313   595|0 20.3% 65.7%       0  107G 91.2G 0|464 4|128  62.4m   84.8m  632 myReplicaSet10003  PRI Apr 21 18:38:50.475
 34782 34697  34583     *0     363   707|0 20.2% 65.6%       0  107G 91.2G 1|126 6|126  70.1m   95.8m  632 myReplicaSet10003  PRI Apr 21 18:38:51.476
 21161 21121  21266     *0     220   451|0 20.3% 65.7%       0  107G 91.2G 0|270 2|128  43.0m   58.4m  632 myReplicaSet10003  PRI Apr 21 18:38:52.476
 33797 33766  33682     *0     356   742|0 20.3% 65.7%       0  107G 91.2G  0|66 5|126  68.3m   93.1m  633 myReplicaSet10003  PRI Apr 21 18:38:53.476
 25138 25234  25290     *0     304   583|0 20.3% 65.7%       0  107G 91.2G 0|342 1|128  51.3m   69.7m  633 myReplicaSet10003  PRI Apr 21 18:38:54.474
 24853 24841  24894     *0     302   589|0 20.3% 65.7%       0  107G 91.2G 0|326 2|128  50.4m   68.2m  633 myReplicaSet10003  PRI Apr 21 18:38:55.474
 27201 27338  27144     *0     327   719|0 20.3% 65.7%       0  107G 91.2G 0|472 1|128  55.3m   75.8m  633 myReplicaSet10003  PRI Apr 21 18:38:56.474
 28100 27988  27999     *0     299   564|0 20.3% 65.7%       0  107G 91.2G 0|182 5|128  56.7m   77.2m  633 myReplicaSet10003  PRI Apr 21 18:38:57.476
insert query update delete getmore command dirty  used flushes vsize   res   qrw   arw net_in net_out conn               set repl                time
 30744 30697  30771     *0     257   499|0 20.2% 65.6%       0  107G 91.2G 0|122 6|128  62.1m   84.9m  633 myReplicaSet10003  PRI Apr 21 18:38:58.475
 23751 23757  23867     *0     284   578|0 20.3% 65.7%       0  107G 91.2G 0|365 4|128  48.4m   65.5m  633 myReplicaSet10003  PRI Apr 21 18:38:59.477
 30903 30961  30817     *0     217   366|0 20.2% 65.6%       0  107G 91.2G 0|311 1|128  62.3m   85.6m  633 myReplicaSet10003  PRI Apr 21 18:39:00.474
 25081 24904  24948     *0     347   674|0 20.2% 65.7%       0  107G 91.2G  15|0 2|128  50.6m   68.6m  633 myReplicaSet10003  PRI Apr 21 18:39:01.478

Comment by jing xu [ 21/Apr/20 ]

However, the most impacting activity is related to the truncation of the profiling table
this information from ftdc?

Comment by jing xu [ 21/Apr/20 ]

thank you,why i add new myShard_3 replicat set,it has a litte connection, myShard_3's ops is 3000,that port is 57017

Comment by Dmitry Agranat [ 21/Apr/20 ]

601290552@qq.com, I had a look at the provided data and I see several issues mainly around moving chunks activities.

However, the most impacting activity is related to the truncation of the profiling table:

  • This is from srvdb1.yto.cloud server after the readConcern:majority was disabled
  • Between A-B and C-D we see most of the queued write operations (as you've reported) which is also correlated to internal metadata locking
  • The most interesting evidence here is the cursor truncate calls metric which stands for truncations per second on a capped collection. Table we use to store profiling data is a capped collection

Please disable profiling, let the workload run for a day or two and upload the fresh data from srvdb1.yto.cloud for us to review (no need to upload data from the rest of the servers). This might not solve all the observed issues but hopefully will remove one of the main bottlenecks.

Thanks,
Dima

Comment by Dmitry Agranat [ 21/Apr/20 ]

Hi 601290552@qq.com, thanks for pointing out the shard name where you have disabled the readConcern:majority. I am looking at the data now and will update you as I progress.

To you questions:

if i convert psa to pss,whether i need to disable readconcern:major for pss?

No, you do not need to disable the readConcern:majority for PSS architecture.

Comment by jing xu [ 21/Apr/20 ]

if i convert psa to pss,whether i need to disable readconcern:major for pss?when i add new myShard_3 to cluster,but there is just a little connection to new replicat set? how to rebalance connection to new replicat set?

Comment by jing xu [ 21/Apr/20 ]

[root@srvdb5 mongod27019]# more automation-mongod.conf

  1. THIS FILE IS MAINTAINED BY http://10.5.130.57:8080 . DO NOT MODIFY AS IT WILL BE OVERWRITTEN.
  2. To make changes to your MongoDB deployment, please visit http://10.5.130.57:8080 . Your Group ID is 5e6f4cfb9d1f143e469e89df .
    net:
    bindIp: 0.0.0.0
    maxIncomingConnections: 65535
    port: 27008
    serviceExecutor: adaptive
    operationProfiling:
    mode: slowOp
    slowOpThresholdMs: 100
    processManagement:
    fork: "true"
    replication:
    enableMajorityReadConcern: false
    oplogSizeMB: 81920
    replSetName: myShard_2
    sharding:
    clusterRole: shardsvr
    storage:
    dbPath: /disk_data/test/mongod27019
    engine: wiredTiger
    wiredTiger:
    collectionConfig: {}
    engineConfig:
    cacheSizeGB: 100
    directoryForIndexes: true
    indexConfig: {}
    systemLog:
    destination: file
    path: /disk_data/test/mongod27019/mongodb.log
    [root@srvdb5 mongod27019]#
Comment by jing xu [ 21/Apr/20 ]

please see readconcern.png attachment,i disable this parameter for cluster.

Comment by jing xu [ 21/Apr/20 ]

"storageEngine" :

{ "name" : "wiredTiger", "supportsCommittedReads" : false, "supportsSnapshotReadConcern" : true, "readOnly" : false, "persistent" : true }

,

Comment by Dmitry Agranat [ 21/Apr/20 ]

Hi 601290552@qq.com, thank you for uploading all the requested information.

Based on the uploaded logs, you still have PSA architecture with readConcern:majority enabled:

2020-04-20T13:30:32.680+0800 I CONTROL  [conn1517] Replica Set Member State: ARBITER

As mentioned in this comment, this is not a recommended configuration. You need to either convert it to PSS or disable readConcern:majority.

Also, running with database profiling enabled under heavy load is not recommended as it might impact cluster's performance.

Thanks,
Dima

Comment by jing xu [ 21/Apr/20 ]

mongos> sh.status()
— Sharding Status —
sharding version:

{ "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("5e7c6d892a7805d574a0820b") }

shards:

{ "_id" : "myShard_1", "host" : "myShard_1/srvdb1.yyto.cloud:27007,srvdb2.yyto.cloud:27007", "state" : 1 } { "_id" : "myShard_2", "host" : "myShard_2/srvdb2.yyto.cloud:27008,srvdb5.yyto.cloud:27008", "state" : 1 } { "_id" : "myShard_3", "host" : "myShard_3/srvdb2.yyto.cloud:57017,srvdb6.yyto.cloud:57017", "state" : 1 }

active mongoses:
"4.0.16" : 3
autosplit:
Currently enabled: yes
balancer:
Currently enabled: no
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
815 : Success
96 : Failed with error 'aborted', from myShard_2 to myShard_3
5 : Failed with error 'aborted', from myShard_1 to myShard_3
databases:

{ "_id" : "config", "primary" : "config", "partitioned" : true }

config.system.sessions
shard key:

{ "_id" : 1 }

unique: false
balancing: true
chunks:
myShard_1 1
{ "_id" :

{ "$minKey" : 1 }

} -->> { "_id" :

{ "$maxKey" : 1 }

} on : myShard_1 Timestamp(2, 0)
{ "_id" : "expMonitordb", "primary" : "myShard_1", "partitioned" : true, "version" :

{ "uuid" : UUID("af7418f5-b931-47c7-b397-776395dcec0f"), "lastMod" : 1 }

}
expMonitordb.lostMonitor
shard key:

{ "no" : "hashed" }

unique: false
balancing: true
chunks:
myShard_1 17909
myShard_2 18791
myShard_3 902
too many chunks to print, use verbose if you want to force print

Comment by jing xu [ 21/Apr/20 ]

finished upload files:mongoftdc.tar.gz

Comment by jing xu [ 21/Apr/20 ]

hello:
i will upload files to you,but compress files has 2.5G,so it slow.
now i add new replica set to mongo cluster.now mongo cluster has three shard,but new replica has small connection,so ops just 2000/s,other node has 30000 ops.
now i stop rebalance.but still just 2000 ops.

Comment by Dmitry Agranat [ 20/Apr/20 ]

Hi 601290552@qq.com,

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) from all members of this replica set and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Comment by jing xu [ 20/Apr/20 ]

hello:
With a three-member PSS architecture for shard cluster,is it still have the same issue of cache pressure ?so i need to disable readConcern:majority ?tks.

Comment by Dmitry Agranat [ 20/Apr/20 ]

Hi 601290552@qq.com,

Having a secondary node running but occasionally lagging might still have the same issue of cache pressure.
The impact of disabling readConcern:majority is documented under the same link I posted in my last comment, here it is for convenience.

Thanks,
Dima

Comment by jing xu [ 20/Apr/20 ]

if i disable readConcern:majority,then readConcern is local?
change to false,it will exists some problems for mongo cluster?

Comment by jing xu [ 20/Apr/20 ]

With a three-member PSA architecture, the cache pressure will increase if any data bearing node is down. To prevent the storage cache pressure from immobilizing a deployment with a PSA architecture, you can disable read concern by setting either:

my env: my secondary node is running.it still has cache pressure on primary

Comment by jing xu [ 20/Apr/20 ]

hello:
i check my mongo cluster,readConcern is majority,but other mongo cluster is pss,readConcern also is majority?how to deal with it?
if i change readConcern:majority to false? it affect my mongo cluster?

Comment by Dmitry Agranat [ 19/Apr/20 ]

Hi 601290552@qq.com,

Thank you for the report. Could you please clarify what MongoDB bug you are reporting here?

In your comment, you've mentioned this replica set uses arbiters, and (I assume) readConcern:majority is enabled. This is not a recommended configuration. Please see our documentation how to address this issue.

Thanks,
Dima

Comment by jing xu [ 18/Apr/20 ]

[root@srvdb5 log]# mloginfo --queries mongodb.log
source: mongodb.log
host: unknown
start: 2020 Apr 18 17:55:17.289
end: 2020 Apr 18 17:56:28.866
date format: iso8601-local
length: 247052
binary: unknown
version: >= 3.0 (iso8601 format, level, component)
storage: unknown

QUERIES

namespace operation pattern count min (ms) max (ms) mean (ms) 95%-ile (ms) sum (ms)

expMonitordb.lostMonitor update None 81732 100 1353 198.16910145353106 n/a 16196757
local.oplog.rs find {"ts": 1} 2 2015 58705 30360.0 n/a 60720
local.oplog.rs find None 1 372 372 372.0 n/a 372
expMonitordb.lostMonitor find {"no": 1} 1 110 110 110.0 n/a 110

[root@srvdb5 log]#

Generated at Thu Feb 08 05:14:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.