Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2130

Improve on-disk page utilization with random workloads

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.8.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Storage Engines

      I've been running a workload that does random inserts. The keys are 24 bytes the values are 225 bytes.

      The page size is 4k. After running for a while I end up with a ~7GB database. The distribution of key/value pairs on the disk pages is:

      Page count Number of pairs
      4530 1
      4609 2
      4018 3
      3810 4
      159912 5
      112646 6
      68976 7
      42168 8
      27469 9
      19701 10
      260192 11
      123105 12
      115519 13
      108806 14
      99244 15

      That says to me: The page can fit up to 15 keys on it. There are 1.1 million leaf pages in total and 290 thousand of those have less than half the possible keys.

      Ideally all of the pages would be at least half full. If we create pages with a small number of entries they widen the span of the tree, and are relatively less likely to be read back in to have content added in the future (i.e: they are likely to waste space indefinitely).

      There are also a lot of pages that are very full - which is bad in this workload. Since the workload is evicting aggressively, those pages are being read in, updated and split into two unequal pages.

        1. wt2130_32k_50.png
          87 kB
        2. wt2130_32k_90.png
          96 kB

            keith.bostic@mongodb.com Keith Bostic (Inactive)
            alexander.gorrod@mongodb.com Alexander Gorrod
            1 Vote for this issue
            4 Start watching this issue