[SERVER-39006] Read query processing is frequently queued in MongoDB Created: 15/Jan/19  Updated: 23/Jan/19  Resolved: 23/Jan/19

Status: Closed
Project: Core Server
Component/s: Concurrency, Performance, Querying, WiredTiger
Affects Version/s: 3.4.14
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Esther Eunbin Baek Assignee: Eric Sedor
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagnostic.data.tar.gz     File mongod.conf     File serverStat.tar.gz    
Operating System: ALL
Participants:

 Description   

We are using MongoDB 3.4.14.
We are operating a service as a sharded cluster, and often there is an issue of increasing connection due to delays in query processing.
We have 15 shards and only use primary server. (There are no quries in secondary.)
The issue has occurred at a some point. And one shard server is the starting point of the issue. (not specific server, It is different every time)
At some point on one of the shard servers, query processing is delayed and finally connection full is occurred.
This cause of delaying query processing and occurring connection full on other shard servers. (Because of the aggregate mergeCursor, problem is occurred on other shard servers too.)

When we checked, the mass incoming of certain types of query was not the cause of the problem.
All the queries that came in when MongoDB was in trouble were the usual queries.
The usual queries were slowing down and problem happened.

 

>> mongostat (we are using find/insert/update/aggregate queries.)

insert query update delete getmore command dirty  used flushes vsize   res  qrw   arw net_in net_out conn       set repl                time
   515  1534    961     *0    2282  2779|0  2.4% 80.0%       0  135G 56.2G  0|0  25|2  5.81m   16.4m 4978 shard04  PRI Dec 16 22:05:26.610
   525  1459    755     *0    2130  2532|0  2.3% 80.0%       0  135G 56.2G  0|0  17|2  5.05m   14.1m 4985 shard04  PRI Dec 16 22:05:27.676
   455  1873    550     *0    2478  2965|0  2.2% 80.0%       0  135G 56.2G  0|1  26|1  5.83m   16.2m 4996 shard04  PRI Dec 16 22:05:28.604
   429  1682    936     *0    2320  2619|0  2.0% 80.0%       0  135G 56.2G  0|0  36|3  4.97m   15.3m 5010 shard04  PRI Dec 16 22:05:29.614
   489  1625    545     *0    2328  2723|0  2.0% 80.0%       0  135G 56.2G  0|2  22|0  5.02m   14.7m 5023 shard04  PRI Dec 16 22:05:30.602
   270  1487    826     *0    2333  2770|0  2.0% 80.0%       0  135G 56.2G  9|1  19|1  4.39m   12.8m 5044 shard04  PRI Dec 16 22:05:31.609
   167   661    269     *0     939  1140|0  2.0% 80.0%       0  135G 56.2G  0|0 10|17  1.99m   6.03m 5062 shard04  PRI Dec 16 22:05:34.043
 16223 59846  28133     *0   79397 95484|0  2.1% 80.0%       0  135G 56.2G  0|0  19|7   174m    488m 5078 shard04  PRI Dec 16 22:05:34.087
   721  1301    846     *0    2318  2562|0  2.1% 80.0%       0  135G 56.2G  0|0  15|2  4.41m   13.0m 5081 shard04  PRI Dec 16 22:05:34.590
   540  1758    558     *0    2255  2587|0  2.2% 80.0%       0  135G 56.2G  0|3  17|0  4.80m   15.2m 5098 shard04  PRI Dec 16 22:05:35.631
insert query update delete getmore command dirty  used flushes vsize   res  qrw   arw net_in net_out conn       set repl                time
   378  1613    726     *0    2304  2757|0  2.3% 80.0%       0  135G 56.2G  0|0  28|2  4.64m   13.6m 5110 shard04  PRI Dec 16 22:05:36.635
   247  1903    608     *0    2408  2842|0  2.3% 80.0%       0  135G 56.2G  0|0  21|2  5.51m   18.8m 5124 shard04  PRI Dec 16 22:05:37.592
   404  1628    731     *0    2321  2834|0  2.3% 80.0%       0  135G 56.2G  0|0  18|0  4.74m   13.8m 5143 shard04  PRI Dec 16 22:05:38.596
   428  1494    987     *0    2292  2748|0  2.4% 80.0%       0  135G 56.2G  0|0  18|2  4.50m   14.0m 5157 shard04  PRI Dec 16 22:05:39.605
   110   577    180     *0     673   764|0  2.4% 80.0%       0  135G 56.2G  0|2  33|0  1.41m   4.60m 5173 shard04  PRI Dec 16 22:05:43.029
  6706 34615  12151     *0   49254 60381|0  2.5% 80.0%       0  135G 56.2G  0|1  30|3  93.2m    274m 5235 shard04  PRI Dec 16 22:05:43.148
   423  1977   1213     *0    2525  3191|0  2.6% 80.0%       0  135G 56.2G  0|0  17|1  5.15m   15.8m 5245 shard04  PRI Dec 16 22:05:43.606
   387  1677    557     *0    2357  2990|0  2.6% 80.0%       0  135G 56.2G  2|2  14|3  4.49m   12.6m 5284 shard04  PRI Dec 16 22:05:44.609
   187  1500    672     *0    2403  2843|0  2.7% 80.0%       0  135G 56.2G  4|0   8|1  4.39m   13.0m 5305 shard04  PRI Dec 16 22:05:45.580
   408  1480    490     *0    2250  2893|0  2.7% 80.0%       0  135G 56.2G  0|0  16|1  4.30m   12.1m 5350 shard04  PRI Dec 16 22:05:46.602
insert query update delete getmore command dirty  used flushes vsize   res  qrw   arw net_in net_out conn       set repl                time
   231  1603    437     *0    2389  3145|0  2.7% 80.0%       0  135G 56.2G  0|2  12|1  4.66m   13.7m 5394 shard04  PRI Dec 16 22:05:47.599
   196  1482    395     *0    2178  3055|0  2.7% 80.0%       0  135G 56.2G  6|4  31|1  4.32m   12.9m 5457 shard04  PRI Dec 16 22:05:48.659
   232  1275    797     *0    2102  3843|0  2.8% 80.0%       0  137G 56.2G  0|0  63|6  4.23m   10.3m 5607 shard04  PRI Dec 16 22:05:49.689
   247  1514    361     *0    2403  5241|0  2.8% 80.0%       0  139G 56.2G  0|0  12|1  5.07m   14.4m 5786 shard04  PRI Dec 16 22:05:50.574
    66  1459    256     *0    2065  4455|0  2.8% 80.0%       0  140G 56.2G  5|0  17|0  4.71m   11.9m 5978 shard04  PRI Dec 16 22:05:51.608
    80  1057    188     *0    1689  4031|0  2.8% 80.0%       0  142G 56.2G  0|0  83|6  3.90m   8.80m 6270 shard04  PRI Dec 16 22:05:52.933
    69   957    451     *0    1415  4790|0  2.8% 80.0%       0  146G 56.2G 62|0 128|6  4.00m   7.64m 6717 shard04  PRI Dec 16 22:05:54.196
    12  1334     83     *0    1317 10142|0  2.8% 80.0%       0  151G 56.3G 297|664 128|128  5.97m   7.87m 7800 shard04  PRI Dec 16 22:05:55.152
     8  1521     17     *0     929 11458|0  2.9% 80.0%       0  154G 56.1G 2097|343   128|2  5.87m   9.42m 9083 shard04  PRI Dec 16 22:05:55.942
    23  1506    365     *0    1296  7617|0  2.9% 80.1%       0  155G 56.1G    669|0 128|128  5.11m   13.7m 9899 shard04  PRI Dec 16 22:05:56.992
insert query update delete getmore command dirty  used flushes vsize   res      qrw     arw net_in net_out  conn       set repl                time
    11   837     32     *0    1435  8388|0  2.9% 80.1%       0  156G 56.2G     16|1    46|1  4.00m   9.81m 10847 shard04  PRI Dec 16 22:05:57.896
    *0    49      2     *0      73   735|0  2.9% 80.1%       0  157G 56.2G      0|5    97|0   336k    555k 11799 shard04  PRI Dec 16 22:06:10.530
    70  4143    319     *0    4587 39201|0  2.9% 80.0%       0  171G 56.6G 11777|1965 128|128  26.7m   36.6m 19000 shard04  PRI Dec 16 22:06:12.776
   147  1745     67     *0    1232 22395|0  2.9% 80.0%       0  174G 57.0G  6428|6774 128|128  18.3m   22.9m 18957 shard04  PRI Dec 16 22:06:13.387
    11   367      8     *0     322  4526|0  2.9% 80.0%       0  176G 57.2G  4084|9374 128|128  3.83m   3.41m 19000 shard04  PRI Dec 16 22:06:14.100
     4   621     13     *0     702  7959|0  3.0% 80.0%       0  177G 57.5G    4|12828  73|128  7.02m   7.93m 19000 shard04  PRI Dec 16 22:06:14.712
     2   512      4     *0    1128  6520|0  3.0% 80.0%       0  177G 57.6G   40|12068 128|128  5.59m   8.50m 19000 shard04  PRI Dec 16 22:06:15.851
     1   634      4     *0    1346  7566|0  3.0% 80.0%       0  177G 57.7G    0|11841 128|128  6.77m   8.73m 18995 shard04  PRI Dec 16 22:06:16.816
    *0   720      3     *0    1593  7595|0  3.0% 80.0%       0  177G 57.8G   20|11773 101|128  7.11m   8.81m 19000 shard04  PRI Dec 16 22:06:17.758
    *0   634     *0     *0    1320  5941|0  3.0% 80.0%       0  177G 57.9G   10|11809  90|128  5.72m   7.05m 19000 shard04  PRI Dec 16 22:06:18.878

 

Attach the server stat.
Please let me know if you need anything else.

 



 Comments   
Comment by Eric Sedor [ 15/Jan/19 ]

Hello, the SERVER jira project is for reporting bugs or feature suggestions and is not the best place to diagnose complex performance issues like this. Unfortunately we aren't able to help here.

We'd like to request you pursue help from the community on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

That said, if you can narrow the issue down to a specific reproducible issue we would be happy to examine it.

Generated at Thu Feb 08 04:50:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.