Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.6.0
    • Labels:
      None
    • # Replies:
      6
    • Last comment by Customer:
      true

      Description

      The WiredTiger LSM implementation currently ignores the "bulk" flag to cursor open.

      MongoDB uses bulk cursors to do background creates into indexes. We could enhance the performance of MongoDB indexes if we added efficient support for bulk load in LSM cursors.

        Issue Links

          Activity

          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          A possible solution here would be to allow bulk load on the first chunk in an LSM tree. That would:

          • Disable logging
          • Enforce in-order updates
          • Enforce single thread

          As per the current bulk load for btree.

          When the bulk load cursor is closed, we would need to switch the chunk out of memory. We'd also need a way to wait for the chunk to be flushed prior to returning from the cursor close (if logging is enabled, anyway). We might be able to achieve that by running a checkpoint on the LSM tree?

          Another consideration would be whether to ignore the chunk size for bulk loads, or to detect that the load has spanned the chunk size and error, or to split bulk loads up across a set of chunks.

          Show
          alexander.gorrod Alexander Gorrod added a comment - A possible solution here would be to allow bulk load on the first chunk in an LSM tree. That would: Disable logging Enforce in-order updates Enforce single thread As per the current bulk load for btree. When the bulk load cursor is closed, we would need to switch the chunk out of memory. We'd also need a way to wait for the chunk to be flushed prior to returning from the cursor close (if logging is enabled, anyway). We might be able to achieve that by running a checkpoint on the LSM tree? Another consideration would be whether to ignore the chunk size for bulk loads, or to detect that the load has spanned the chunk size and error, or to split bulk loads up across a set of chunks.
          Show
          michael.cahill Michael Cahill added a comment - https://github.com/wiredtiger/wiredtiger/pull/1945
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

          Message: Add support for bulk load in LSM trees.

          This allows us to load into a single btree, using btree bulk load
          semantics (single threaded, in order, no logging). Once the load completes
          we switch the chunk out for the LSM tree.

          It's possible that we could avoid some of the switch logic when closing a
          bulk load cursor - since the file is flushed when closing the btree handle.
          It's simpler to use the switch logic to update the state of the tree.

          Refs WT-1922 SERVER-18321
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/c82ed17fd2c47d87e525bcd37e4a69c11d0336fe

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'} Message: Add support for bulk load in LSM trees. This allows us to load into a single btree, using btree bulk load semantics (single threaded, in order, no logging). Once the load completes we switch the chunk out for the LSM tree. It's possible that we could avoid some of the switch logic when closing a bulk load cursor - since the file is flushed when closing the btree handle. It's simpler to use the switch logic to update the state of the tree. Refs WT-1922 SERVER-18321 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/c82ed17fd2c47d87e525bcd37e4a69c11d0336fe
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #1948 from wiredtiger/lsm-bulk-load

          Add support for bulk load in LSM trees.
          Refs WT-1922 SERVER-18321
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/4d37a27896872dc5d280f5e85666e1d8431ec33b

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #1948 from wiredtiger/lsm-bulk-load Add support for bulk load in LSM trees. Refs WT-1922 SERVER-18321 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/4d37a27896872dc5d280f5e85666e1d8431ec33b
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: WT-1922 Add support for bulk load in LSM trees. Also references
          SERVER-18321

          (cherry picked from commit 4d37a27896872dc5d280f5e85666e1d8431ec33b)
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/10eb756c7bb8cc1a6847a2f2fec5fcb2ee883d91

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: WT-1922 Add support for bulk load in LSM trees. Also references SERVER-18321 (cherry picked from commit 4d37a27896872dc5d280f5e85666e1d8431ec33b) Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/10eb756c7bb8cc1a6847a2f2fec5fcb2ee883d91
          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          I back ported this and a related change in WT-1924 to the WiredTiger mongodb-3.0 branch, since it was causing test failures on Evergreen. This does not mean LSM is a supported configuration in MongoDB 3.0, just that I don't like staring at build failures.

          Show
          alexander.gorrod Alexander Gorrod added a comment - I back ported this and a related change in WT-1924 to the WiredTiger mongodb-3.0 branch, since it was causing test failures on Evergreen. This does not mean LSM is a supported configuration in MongoDB 3.0, just that I don't like staring at build failures.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 47 weeks ago
                Date of 1st Reply: