-
Type:
Task
-
Resolution: Done
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
This references https://jira.mongodb.org/browse/SERVER-17195. When there is highly contended update traffic to the same data, WT quickly appears to get stuck and throughput drops to nearly nothing.
insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn time *0 *0 15963 *0 0 1|0 0.0 0.0 0 347.0M 80.0M 0|0 0|18 2m 1m 22 20:11:19 *0 *0 15977 *0 0 1|0 0.0 0.0 0 348.0M 82.0M 0|0 0|15 2m 1m 22 20:11:20 *0 *0 15915 *0 0 1|0 0.0 0.0 0 349.0M 83.0M 0|0 0|18 2m 1m 22 20:11:21 *0 *0 15631 *0 0 1|0 0.0 0.0 1 350.0M 83.0M 0|0 0|12 2m 1m 22 20:11:29 *0 *0 16095 *0 0 3|0 0.0 0.0 0 351.0M 84.0M 0|0 0|19 2m 1m 23 20:11:30 *0 *0 16041 *0 0 1|0 0.0 0.0 0 352.0M 86.0M 0|0 0|18 2m 1m 23 20:11:31 *0 *0 3411 *0 0 1|0 0.0 0.0 0 353.0M 86.0M 0|0 0|20 508k 258k 23 20:11:32 *0 *0 2 *0 0 1|0 0.0 0.0 0 353.0M 86.0M 0|0 0|20 377b 16k 23 20:11:33 *0 *0 *0 *0 0 2|0 0.0 0.0 0 353.0M 86.0M 0|0 0|20 133b 16k 23 20:11:34 *0 *0 1 *0 0 1|0 0.0 0.0 0 353.0M 86.0M 0|0 0|20 228b 16k 23 20:11:35
When I run pmp on mongod during the (long) stuck time I always see everyone sleeping in wt_page_in_func and one thread in either _rec_txn_read or wt_update_list_memsize. Basically the update list is getting very long and it is taking a huge amount of time to walk it. I added stats to rec_txn_read and while we see this issue happening, in the 5 second stat span, _rec_txn_read is called 635 times but we take almost 136M iterations through the update list loop, or an average of 200K iterations per call.
- related to
-
WT-1650 Remove obsolete updates every time we add a new update.
- Closed