-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.14
-
Component/s: Concurrency, Performance, Querying, WiredTiger
-
Labels:None
-
ALL
We are using MongoDB 3.4.14.
We are operating a service as a sharded cluster, and often there is an issue of increasing connection due to delays in query processing.
We have 15 shards and only use primary server. (There are no quries in secondary.)
The issue has occurred at a some point. And one shard server is the starting point of the issue. (not specific server, It is different every time)
At some point on one of the shard servers, query processing is delayed and finally connection full is occurred.
This cause of delaying query processing and occurring connection full on other shard servers. (Because of the aggregate mergeCursor, problem is occurred on other shard servers too.)
When we checked, the mass incoming of certain types of query was not the cause of the problem.
All the queries that came in when MongoDB was in trouble were the usual queries.
The usual queries were slowing down and problem happened.
>> mongostat (we are using find/insert/update/aggregate queries.)
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time 515 1534 961 *0 2282 2779|0 2.4% 80.0% 0 135G 56.2G 0|0 25|2 5.81m 16.4m 4978 shard04 PRI Dec 16 22:05:26.610 525 1459 755 *0 2130 2532|0 2.3% 80.0% 0 135G 56.2G 0|0 17|2 5.05m 14.1m 4985 shard04 PRI Dec 16 22:05:27.676 455 1873 550 *0 2478 2965|0 2.2% 80.0% 0 135G 56.2G 0|1 26|1 5.83m 16.2m 4996 shard04 PRI Dec 16 22:05:28.604 429 1682 936 *0 2320 2619|0 2.0% 80.0% 0 135G 56.2G 0|0 36|3 4.97m 15.3m 5010 shard04 PRI Dec 16 22:05:29.614 489 1625 545 *0 2328 2723|0 2.0% 80.0% 0 135G 56.2G 0|2 22|0 5.02m 14.7m 5023 shard04 PRI Dec 16 22:05:30.602 270 1487 826 *0 2333 2770|0 2.0% 80.0% 0 135G 56.2G 9|1 19|1 4.39m 12.8m 5044 shard04 PRI Dec 16 22:05:31.609 167 661 269 *0 939 1140|0 2.0% 80.0% 0 135G 56.2G 0|0 10|17 1.99m 6.03m 5062 shard04 PRI Dec 16 22:05:34.043 16223 59846 28133 *0 79397 95484|0 2.1% 80.0% 0 135G 56.2G 0|0 19|7 174m 488m 5078 shard04 PRI Dec 16 22:05:34.087 721 1301 846 *0 2318 2562|0 2.1% 80.0% 0 135G 56.2G 0|0 15|2 4.41m 13.0m 5081 shard04 PRI Dec 16 22:05:34.590 540 1758 558 *0 2255 2587|0 2.2% 80.0% 0 135G 56.2G 0|3 17|0 4.80m 15.2m 5098 shard04 PRI Dec 16 22:05:35.631 insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time 378 1613 726 *0 2304 2757|0 2.3% 80.0% 0 135G 56.2G 0|0 28|2 4.64m 13.6m 5110 shard04 PRI Dec 16 22:05:36.635 247 1903 608 *0 2408 2842|0 2.3% 80.0% 0 135G 56.2G 0|0 21|2 5.51m 18.8m 5124 shard04 PRI Dec 16 22:05:37.592 404 1628 731 *0 2321 2834|0 2.3% 80.0% 0 135G 56.2G 0|0 18|0 4.74m 13.8m 5143 shard04 PRI Dec 16 22:05:38.596 428 1494 987 *0 2292 2748|0 2.4% 80.0% 0 135G 56.2G 0|0 18|2 4.50m 14.0m 5157 shard04 PRI Dec 16 22:05:39.605 110 577 180 *0 673 764|0 2.4% 80.0% 0 135G 56.2G 0|2 33|0 1.41m 4.60m 5173 shard04 PRI Dec 16 22:05:43.029 6706 34615 12151 *0 49254 60381|0 2.5% 80.0% 0 135G 56.2G 0|1 30|3 93.2m 274m 5235 shard04 PRI Dec 16 22:05:43.148 423 1977 1213 *0 2525 3191|0 2.6% 80.0% 0 135G 56.2G 0|0 17|1 5.15m 15.8m 5245 shard04 PRI Dec 16 22:05:43.606 387 1677 557 *0 2357 2990|0 2.6% 80.0% 0 135G 56.2G 2|2 14|3 4.49m 12.6m 5284 shard04 PRI Dec 16 22:05:44.609 187 1500 672 *0 2403 2843|0 2.7% 80.0% 0 135G 56.2G 4|0 8|1 4.39m 13.0m 5305 shard04 PRI Dec 16 22:05:45.580 408 1480 490 *0 2250 2893|0 2.7% 80.0% 0 135G 56.2G 0|0 16|1 4.30m 12.1m 5350 shard04 PRI Dec 16 22:05:46.602 insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time 231 1603 437 *0 2389 3145|0 2.7% 80.0% 0 135G 56.2G 0|2 12|1 4.66m 13.7m 5394 shard04 PRI Dec 16 22:05:47.599 196 1482 395 *0 2178 3055|0 2.7% 80.0% 0 135G 56.2G 6|4 31|1 4.32m 12.9m 5457 shard04 PRI Dec 16 22:05:48.659 232 1275 797 *0 2102 3843|0 2.8% 80.0% 0 137G 56.2G 0|0 63|6 4.23m 10.3m 5607 shard04 PRI Dec 16 22:05:49.689 247 1514 361 *0 2403 5241|0 2.8% 80.0% 0 139G 56.2G 0|0 12|1 5.07m 14.4m 5786 shard04 PRI Dec 16 22:05:50.574 66 1459 256 *0 2065 4455|0 2.8% 80.0% 0 140G 56.2G 5|0 17|0 4.71m 11.9m 5978 shard04 PRI Dec 16 22:05:51.608 80 1057 188 *0 1689 4031|0 2.8% 80.0% 0 142G 56.2G 0|0 83|6 3.90m 8.80m 6270 shard04 PRI Dec 16 22:05:52.933 69 957 451 *0 1415 4790|0 2.8% 80.0% 0 146G 56.2G 62|0 128|6 4.00m 7.64m 6717 shard04 PRI Dec 16 22:05:54.196 12 1334 83 *0 1317 10142|0 2.8% 80.0% 0 151G 56.3G 297|664 128|128 5.97m 7.87m 7800 shard04 PRI Dec 16 22:05:55.152 8 1521 17 *0 929 11458|0 2.9% 80.0% 0 154G 56.1G 2097|343 128|2 5.87m 9.42m 9083 shard04 PRI Dec 16 22:05:55.942 23 1506 365 *0 1296 7617|0 2.9% 80.1% 0 155G 56.1G 669|0 128|128 5.11m 13.7m 9899 shard04 PRI Dec 16 22:05:56.992 insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time 11 837 32 *0 1435 8388|0 2.9% 80.1% 0 156G 56.2G 16|1 46|1 4.00m 9.81m 10847 shard04 PRI Dec 16 22:05:57.896 *0 49 2 *0 73 735|0 2.9% 80.1% 0 157G 56.2G 0|5 97|0 336k 555k 11799 shard04 PRI Dec 16 22:06:10.530 70 4143 319 *0 4587 39201|0 2.9% 80.0% 0 171G 56.6G 11777|1965 128|128 26.7m 36.6m 19000 shard04 PRI Dec 16 22:06:12.776 147 1745 67 *0 1232 22395|0 2.9% 80.0% 0 174G 57.0G 6428|6774 128|128 18.3m 22.9m 18957 shard04 PRI Dec 16 22:06:13.387 11 367 8 *0 322 4526|0 2.9% 80.0% 0 176G 57.2G 4084|9374 128|128 3.83m 3.41m 19000 shard04 PRI Dec 16 22:06:14.100 4 621 13 *0 702 7959|0 3.0% 80.0% 0 177G 57.5G 4|12828 73|128 7.02m 7.93m 19000 shard04 PRI Dec 16 22:06:14.712 2 512 4 *0 1128 6520|0 3.0% 80.0% 0 177G 57.6G 40|12068 128|128 5.59m 8.50m 19000 shard04 PRI Dec 16 22:06:15.851 1 634 4 *0 1346 7566|0 3.0% 80.0% 0 177G 57.7G 0|11841 128|128 6.77m 8.73m 18995 shard04 PRI Dec 16 22:06:16.816 *0 720 3 *0 1593 7595|0 3.0% 80.0% 0 177G 57.8G 20|11773 101|128 7.11m 8.81m 19000 shard04 PRI Dec 16 22:06:17.758 *0 634 *0 *0 1320 5941|0 3.0% 80.0% 0 177G 57.9G 10|11809 90|128 5.72m 7.05m 19000 shard04 PRI Dec 16 22:06:18.878
Attach the server stat.
Please let me know if you need anything else.