update list exceedingly long

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Done
    • WT2.5.1
    • Affects Version/s: None
    • Component/s: None
    • None
    • None
    • None

      This references https://jira.mongodb.org/browse/SERVER-17195. When there is highly contended update traffic to the same data, WT quickly appears to get stuck and throughput drops to nearly nothing.

      insert query update delete getmore command % dirty % used flushes  vsize   res qr|qw ar|aw netIn netOut conn     time
          *0    *0  15963     *0       0     1|0     0.0    0.0       0 347.0M 80.0M   0|0  0|18    2m     1m   22 20:11:19
          *0    *0  15977     *0       0     1|0     0.0    0.0       0 348.0M 82.0M   0|0  0|15    2m     1m   22 20:11:20
          *0    *0  15915     *0       0     1|0     0.0    0.0       0 349.0M 83.0M   0|0  0|18    2m     1m   22 20:11:21
          *0    *0  15631     *0       0     1|0     0.0    0.0       1 350.0M 83.0M   0|0  0|12    2m     1m   22 20:11:29
          *0    *0  16095     *0       0     3|0     0.0    0.0       0 351.0M 84.0M   0|0  0|19    2m     1m   23 20:11:30
          *0    *0  16041     *0       0     1|0     0.0    0.0       0 352.0M 86.0M   0|0  0|18    2m     1m   23 20:11:31
          *0    *0   3411     *0       0     1|0     0.0    0.0       0 353.0M 86.0M   0|0  0|20  508k   258k   23 20:11:32
          *0    *0      2     *0       0     1|0     0.0    0.0       0 353.0M 86.0M   0|0  0|20  377b    16k   23 20:11:33
          *0    *0     *0     *0       0     2|0     0.0    0.0       0 353.0M 86.0M   0|0  0|20  133b    16k   23 20:11:34
          *0    *0      1     *0       0     1|0     0.0    0.0       0 353.0M 86.0M   0|0  0|20  228b    16k   23 20:11:35
      

      When I run pmp on mongod during the (long) stuck time I always see everyone sleeping in wt_page_in_func and one thread in either _rec_txn_read or wt_update_list_memsize. Basically the update list is getting very long and it is taking a huge amount of time to walk it. I added stats to rec_txn_read and while we see this issue happening, in the 5 second stat span, _rec_txn_read is called 635 times but we take almost 136M iterations through the update list loop, or an average of 200K iterations per call.

            Assignee:
            Unassigned
            Reporter:
            Susan LoVerso (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: