[SERVER-19995] Performance drop-off when capped collection becomes full in WiredTiger Created: 17/Aug/15  Updated: 26/Oct/15  Resolved: 25/Oct/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ramon Fernandez Marina Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-19522 Capped collection insert rate decline... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

See SERVER-19522 for all the information.

Participants:

 Description   

This is a continuation of SERVER-19522, where a performance drop-off was observed when capped collections became full. Opening a new ticket to investigate this issue separately.



 Comments   
Comment by Alexander Gorrod [ 29/Sep/15 ]

I re-ran the test against the MongoDB 3.0 code base (commit ea2cc1388cf707512a04f4437def3aedd78c7211), so between 3.0.6 and 3.0.7 releases, there is somewhat of a degradation, but not the performance cliff that this issue describes:

insert query update delete getmore command % dirty % used flushes  vsize   res qr|qw ar|aw netIn netOut conn     time
    *0    *0     *0     *0       0     0|0     0.0    0.0       0 246.0M 35.0M   0|0   1|0   18b     2k    1 13:33:35
 60232    *0     *0     *0       0     8|0     0.6    0.6       0 579.0M 325.0M   0|0  1|12  241m     6k   21 13:33:45
 60428    *0     *0     *0       0     0|0     1.2    1.2       0 680.0M 428.0M   0|0  1|17  244m     5k   21 13:33:55
 72948    *0     *0     *0       0     0|0     2.4    2.4       0 854.0M 603.0M   0|0   1|3  296m     6k   21 13:34:05
 68008    *0     *0     *0       0     0|0     3.0    3.0       0 956.0M 706.0M   0|2   1|1  275m     5k   21 13:34:15
 70841    *0     *0     *0       0     0|0     0.6    2.0       1 1008.0M 755.0M   0|0  1|18  283m     5k   21 13:34:25
 68269    *0     *0     *0       0     0|0     1.2    2.6       0 1008.0M 756.0M   0|0   1|2  278m     5k   21 13:34:35
 71819    *0     *0     *0       0     0|0     1.7    3.1       0 1008.0M 758.0M   0|5   1|8  287m     6k   21 13:34:45
 69660    *0     *0     *0       0     0|0     2.8    4.2       0    1.2G 932.0M   0|1  1|14  281m     5k   21 13:34:55
 59205    *0     *0     *0       0     0|0     3.4    4.5       0    1.3G   1.0G   0|0  1|14  240m     5k   21 13:35:05
 58120    *0     *0     *0       0     0|0     4.0    4.8       0    1.4G   1.1G   0|0  1|13  234m     5k   21 13:35:15
 53676    *0     *0     *0       0     0|0     0.6    2.2       1    1.4G   1.2G   0|1  1|16  216m     5k   21 13:35:25
 52502    *0     *0     *0       0     0|0     1.0    2.7       0    1.4G   1.2G   0|0  1|15  211m     4k   21 13:35:35
 55534    *0     *0     *0       0     0|0     1.7    3.1       0    1.4G   1.2G   0|1  1|16  224m     5k   21 13:35:45
 52289    *0     *0     *0       0     0|0     2.5    3.7       0    1.4G   1.2G   0|0  1|15  212m     4k   21 13:35:55
 58071    *0     *0     *0       0     0|0     3.4    4.5       0    1.4G   1.2G   0|8   1|9  234m     5k   21 13:36:05
 53785    *0     *0     *0       0     0|0     4.4    5.3       0    1.4G   1.2G   0|2  1|10  216m     5k   21 13:36:15
 56786    *0     *0     *0       0     0|0     5.2    5.9       0    1.5G   1.3G   0|2  1|10  230m     5k   21 13:36:25
 53371    *0     *0     *0       0     0|0     0.7    2.9       1    1.6G   1.3G   0|0  1|13  215m     5k   21 13:36:35
 53611    *0     *0     *0       0     0|0     1.1    3.1       0    1.6G   1.3G   0|2  1|14  215m     5k   21 13:36:45

Throughput appears to stabilise at around 52k inserts.

I think this issue can be closed. bruce.lucas Do you agree?

Comment by Alexander Gorrod [ 29/Sep/15 ]

I've re-run the original workload from SERVER-19522 and the degradation no longer reproduces.

Mongostat output from August 18th (git commit d4e4b25d8ca52f79781fc1fdd96d28bed08212cc):

insert query update delete getmore command % dirty % used flushes  vsize   res qr|qw ar|aw netIn netOut conn     time
    *0    *0     *0     *0       0     0|0     0.0    0.0       0 253.0M 38.0M   0|0   1|0   18b     2k    1 12:54:55
 43550    *0     *0     *0       0     8|0     1.0    1.0       0 600.0M 341.0M   0|1   1|1  175m     4k   21 12:55:05
 67667    *0     *0     *0       0     0|0     3.0    3.0       0 997.0M 740.0M   0|0  1|12  272m     4k   21 12:55:15
 67213    *0     *0     *0       0     0|0     6.4    6.4       0   1.6G   1.3G   0|2   1|4  272m     4k   21 12:55:25
 71157    *0     *0     *0       0     0|0    10.1   10.5       1   2.4G   2.1G   0|0  1|10  286m     4k   21 12:55:35
 68587    *0     *0     *0       0     0|0     1.9    4.6       0   2.4G   2.1G   0|3   1|1  278m     4k   21 12:55:45
 72634    *0     *0     *0       0     0|0     9.6   12.3       0   2.6G   2.3G   0|3   1|5  293m     5k   21 12:55:55
 69283    *0     *0     *0       0     0|0    16.2   18.9       0   3.7G   3.4G   0|3   1|2  279m     4k   21 12:56:05
 71874    *0     *0     *0       0     0|0    22.0   24.7       0   4.7G   4.4G   0|0   1|5  290m     4k   21 12:56:15
 35778    *0     *0     *0       0     0|0    22.8   25.4       0   5.4G   5.1G   0|0  1|19  143m     3k   21 12:56:25
 35187    *0     *0     *0       0     0|0    23.0   25.6       0   5.4G   5.1G   0|0  1|20  142m     3k   21 12:56:35
 35365    *0     *0     *0       0     0|0    27.2   29.9       1   5.6G   5.4G   0|0  1|19  142m     3k   21 12:56:45
 35320    *0     *0     *0       0     0|0     1.3    2.7       0   5.7G   5.5G   0|0  1|13  144m     3k   21 12:56:55
 38252    *0     *0     *0       0     0|0     3.9    5.7       0   5.7G   5.5G   0|0  1|18  153m     3k   21 12:57:05
 35788    *0     *0     *0       0     0|0     2.5    4.2       0   5.7G   5.4G   0|0  1|19  144m     3k   21 12:57:15
 38399    *0     *0     *0       0     0|0    11.4   12.9       0   5.7G   5.5G   0|0  1|16  155m     3k   21 12:57:25
 34933    *0     *0     *0       0     0|0    19.5   20.9       0   5.7G   5.5G   0|3  1|15  141m     3k   21 12:57:35
 36378    *0     *0     *0       0     0|0    21.0   22.3       0   5.7G   5.5G   0|0  1|18  148m     3k   21 12:57:45
 35148    *0     *0     *0       0     0|0    21.4   22.5       0   5.7G   5.5G   0|0  1|20  140m     3k   21 12:57:55

Mongostat output from September 29th (git commit a998887902dfd9cb6f125233c86361064f80c57e):

insert query update delete getmore command % dirty % used flushes  vsize   res qr|qw ar|aw netIn netOut conn     time
    *0    *0     *0     *0       0     0|0     0.0    0.0       0 264.0M 41.0M   0|0   1|0   18b     2k    1 13:11:49
 48771    *0     *0     *0       0     8|0     0.4    0.4       0 584.0M 316.0M   0|0  1|14  194m     5k   21 13:11:59
 83937    *0     *0     *0       0     0|0     0.8    0.8       0 686.0M 421.0M   0|0   1|9  340m     5k   21 13:12:09
 84036    *0     *0     *0       0     0|0     1.3    1.3       0 752.0M 487.0M   0|0   1|4  340m     5k   21 13:12:19
 85313    *0     *0     *0       0     0|0     1.5    1.5       0 802.0M 538.0M   0|4   1|5  343m     5k   21 13:12:29
 83366    *0     *0     *0       0     0|0     0.4    1.9       1 864.0M 600.0M   0|0   1|2  338m     5k   21 13:12:39
 83465    *0     *0     *0       0     0|0     0.5    2.2       0 1003.0M 740.0M   0|4  1|12  335m     5k   21 13:12:49
 78906    *0     *0     *0       0     0|0     2.0    3.7       0    1.2G 982.0M   0|3  1|10  318m     5k   21 13:12:59
 64353    *0     *0     *0       0     0|0     2.5    3.6       0    1.2G 994.0M   0|2  1|13  259m     4k   21 13:13:09
 64802    *0     *0     *0       0     0|0     2.7    3.7       0    1.3G   1.1G   0|0  1|13  262m     4k   21 13:13:19
insert query update delete getmore command % dirty % used flushes   vsize    res qr|qw ar|aw netIn netOut conn     time
 65354    *0     *0     *0       0     0|0     3.3    3.9       0    1.4G   1.1G   0|0  1|17  262m     4k   21 13:13:29
 65142    *0     *0     *0       0     0|0     3.9    4.5       0    1.4G   1.1G   0|0  1|14  264m     4k   21 13:13:39
 63101    *0     *0     *0       0     0|0     1.3    3.4       1    1.4G   1.2G   0|0  1|17  255m     4k   21 13:13:49
 62861    *0     *0     *0       0     0|0     1.1    2.9       0    1.4G   1.2G   0|1  1|13  253m     4k   21 13:13:59
 63720    *0     *0     *0       0     0|0     1.7    3.2       0    1.4G   1.2G   0|1  1|14  257m     4k   21 13:14:09
 62890    *0     *0     *0       0     0|0     2.5    3.6       0    1.4G   1.2G   0|0  1|14  253m     4k   21 13:14:19
 65512    *0     *0     *0       0     0|0     2.8    3.7       0    1.4G   1.2G   0|1  1|14  265m     4k   21 13:14:29
 66323    *0     *0     *0       0     0|0     3.3    4.0       0    1.5G   1.2G   0|0  1|19  266m     4k   21 13:14:39
 64562    *0     *0     *0       0     0|0     3.7    4.2       0    1.5G   1.3G   0|0  1|16  262m     4k   21 13:14:49

The level of throughput is maintained for at least 15 minutes with the newer release. I've verified that the size of the collection has definitely stabilized during the recorded time period (i.e: the volume of inserts exceeds the cap size).

Comment by Alexander Gorrod [ 28/Sep/15 ]

Is this an expected behavior? Once a capped collection becomes full work needs to be done to remove the obsolete data. That work isn't free, so I would expect performance to drop off.

We could potentially investigate optimizing the code that maintains the size of capped collections - I'd call that an enhancement rather than a bug fix though.

Generated at Thu Feb 08 03:52:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.