Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2130

Improve on-disk page utilization with random workloads

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.8.0
    • Labels:
      None
    • # Replies:
      33
    • Last comment by Customer:
      true

      Description

      I've been running a workload that does random inserts. The keys are 24 bytes the values are 225 bytes.

      The page size is 4k. After running for a while I end up with a ~7GB database. The distribution of key/value pairs on the disk pages is:

      Page count Number of pairs
      4530 1
      4609 2
      4018 3
      3810 4
      159912 5
      112646 6
      68976 7
      42168 8
      27469 9
      19701 10
      260192 11
      123105 12
      115519 13
      108806 14
      99244 15

      That says to me: The page can fit up to 15 keys on it. There are 1.1 million leaf pages in total and 290 thousand of those have less than half the possible keys.

      Ideally all of the pages would be at least half full. If we create pages with a small number of entries they widen the span of the tree, and are relatively less likely to be read back in to have content added in the future (i.e: they are likely to waste space indefinitely).

      There are also a lot of pages that are very full - which is bad in this workload. Since the workload is evicting aggressively, those pages are being read in, updated and split into two unequal pages.

      1. wt2130_32k_50.png
        87 kB
      2. wt2130_32k_90.png
        96 kB

        Issue Links

          Activity

          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #2455 from wiredtiger/WT-2130-split_pct

          WT-2130 Don't round the split_pct to an allocation size.
          Branch: mongodb-3.2
          https://github.com/wiredtiger/wiredtiger/commit/6bff4ed03ac73f18719de48b95c2f3c289ea2661

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #2455 from wiredtiger/ WT-2130 -split_pct WT-2130 Don't round the split_pct to an allocation size. Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/6bff4ed03ac73f18719de48b95c2f3c289ea2661
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

          Message: Import wiredtiger-wiredtiger-2.7.0-650-g5cdd3e3.tar.gz from wiredtiger branch mongodb-3.2

          ref: 07966a4..5cdd3e3

          SERVER-22437 Coverity analysis defect 77704: Redundant test
          SERVER-22438 Coverity analysis defect 77705: Dereference before null check
          SERVER-22676 WiredTiger fails to open databases created by 3.0.0 or 3.0.1
          WT-2130 Improve on-disk page utlilization with random workloads
          WT-2215 WT_LSN needs to support atomic reads and updates
          WT-2295 WT_SESSION.create does a full-scan of the main table
          WT-2352 Allow build and test without requiring lz4
          WT-2356 log scan advances to next log file on partially written record
          WT-2363 Remove built in support for bzip2
          WT-2368 row-store can pass garbage keys to collator functions
          WT-2369 Use C compiler to detect headers instead of C++ compiler
          WT-2371 parent split cannot access the page after page-index swap
          WT-2372 WiredTiger windows builder fails with C4005 against the "inline" macro
          WT-2377 WTPERF doesn't compile in Windows under MSVC
          WT-2378 Tasks time out on LSM builder
          WT-2397 Cursor traversal from end of the tree skips records.
          WT-60 Big endian port
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/f77630a9e971cae1f921292ea31d9d40a4b096b8

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'} Message: Import wiredtiger-wiredtiger-2.7.0-650-g5cdd3e3.tar.gz from wiredtiger branch mongodb-3.2 ref: 07966a4..5cdd3e3 SERVER-22437 Coverity analysis defect 77704: Redundant test SERVER-22438 Coverity analysis defect 77705: Dereference before null check SERVER-22676 WiredTiger fails to open databases created by 3.0.0 or 3.0.1 WT-2130 Improve on-disk page utlilization with random workloads WT-2215 WT_LSN needs to support atomic reads and updates WT-2295 WT_SESSION.create does a full-scan of the main table WT-2352 Allow build and test without requiring lz4 WT-2356 log scan advances to next log file on partially written record WT-2363 Remove built in support for bzip2 WT-2368 row-store can pass garbage keys to collator functions WT-2369 Use C compiler to detect headers instead of C++ compiler WT-2371 parent split cannot access the page after page-index swap WT-2372 WiredTiger windows builder fails with C4005 against the "inline" macro WT-2377 WTPERF doesn't compile in Windows under MSVC WT-2378 Tasks time out on LSM builder WT-2397 Cursor traversal from end of the tree skips records. WT-60 Big endian port Branch: v3.2 https://github.com/mongodb/mongo/commit/f77630a9e971cae1f921292ea31d9d40a4b096b8
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #2455 from wiredtiger/WT-2130-split_pct

          (cherry picked from commit 6bff4ed)

          WT-2130 Backport - Don't round the split_pct to an allocation size.
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/3dbc6c653591d997da63ec1d677ecff5c4c92a8e

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #2455 from wiredtiger/ WT-2130 -split_pct (cherry picked from commit 6bff4ed) WT-2130 Backport - Don't round the split_pct to an allocation size. Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/3dbc6c653591d997da63ec1d677ecff5c4c92a8e
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #2455 from wiredtiger/WT-2130-split_pct

          (cherry picked from commit 6bff4ed)

          WT-2130 Backport - Don't round the split_pct to an allocation size.
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/3dbc6c653591d997da63ec1d677ecff5c4c92a8e

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #2455 from wiredtiger/ WT-2130 -split_pct (cherry picked from commit 6bff4ed) WT-2130 Backport - Don't round the split_pct to an allocation size. Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/3dbc6c653591d997da63ec1d677ecff5c4c92a8e
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

          Message: Import wiredtiger-wiredtiger-mongodb-3.0.9-3-g3dbc6c6.tar.gz from wiredtiger branch mongodb-3.0

          ref: 62b3ca8..3dbc6c6

          WT-2130 Improve on-disk page utlilization with random workloads
          SERVER-22898 High fragmentation on WiredTiger databases under write workloads
          Branch: v3.0
          https://github.com/mongodb/mongo/commit/c62a2810e54ed4ac7b98c75896b614d3ff3eb619

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'} Message: Import wiredtiger-wiredtiger-mongodb-3.0.9-3-g3dbc6c6.tar.gz from wiredtiger branch mongodb-3.0 ref: 62b3ca8..3dbc6c6 WT-2130 Improve on-disk page utlilization with random workloads SERVER-22898 High fragmentation on WiredTiger databases under write workloads Branch: v3.0 https://github.com/mongodb/mongo/commit/c62a2810e54ed4ac7b98c75896b614d3ff3eb619

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 16 weeks, 3 days ago
                Date of 1st Reply: