Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1315

Support for multi-index searches / inner joins

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.7.0
    • Labels:

      Description

      Trying to capture the fundamental requirements here in generic terms here, would be happy to do even more in-detail practical examples etc. as required.

      High-level background

      One common pattern we have is to have one primary table with ‘business objects’ (BO) that uses e.g. an ‘UUID’ as the primary key (tweaked to be temporally sortable with a custom collator), and a number of secondary indices (SI-1..SI-n) of various types (UUIDs, custom DateTimes, integers, strings, …).
      We need to serve a complex query of data (arbitrary complexity referencing any number of attributes of the BO, range checks, equality checks, with typical logical operators etc.).

      Our current implementation makes an analysis of the query to find ‘candidate indexes’ and takes the one with the ‘highest priority’ (as pre-defined by a human at compile time) and uses that to make a rough initial filtering on either the range or equality test relevant for that index, then we post-process the results with our own custom decision diagram filtering (which is also later used for filtering live updates going forward) before returning data to the querying client.

      Some of the queries would benefit tremendously of applying multiple indexes simultaneously, as it can e.g. be a combination of date-time range and the user owning a given BO. A simple example would be ’give me all my BO modified in the last week which are owned by user X’, where we have both a date-time and user index available - the individual indexes are both half-bad, but in combination they would yield a very good rough initial filtering.

      Multi-index searching

      Essentially what we are looking for is inner joins / multi-index searching capabilities, which would allow us to apply multiple indexes to our read cursors, both with range and equality checks using custom comparators/collators. We are primarily looking at AND:ing the secondary index queries.

      We discussed one way to implement this ourselves previously (for AND with multiple indexes). The idea was to create a bloom filter based on the primary key for each secondary index except one, merge all of those bloom filters together, and then iterate over the final remaining secondary index and check vs. the aggregated bloom filter for existence. As the Bloom would guarantee no false negatives, we would be able to discard any record returned from the final index scan that did not exist in the Bloom. A false positive we can easily handle in our final post-filter stage anyhow (as we usually would have additional filter criteria on things that are not indexed, we need to do this regardless).

      Concrete example (simplified)

      We have a business object type called ‘Trade’, which has the following attributes (it is simplified in this example):

      FIELD (TYPE)
      creator (uuid)
      exchange created datetime (datetime)
      venue created datetime (datetime)
      host created datetime (datetime)
      host modified datetime (datetime)
      instrument (uuid)
      operator (uuid)
      order (uuid)
      owner (uuid)
      portfolio (uuid) 
      price (double)
      root strategy (uuid)
      settlement date (datetime)
      side (integer)
      trade identifier (uuid)
      trade type (integer)
      volume (double)
      deleted (boolean)
      

      Primary key: trade identifier
      Secondary keys: portfolio, venue created datetime, host modified datetime, instrument, root strategy

      Typical queries could be similar to:

      1. {Portfolio == <uuid> 60c1dfd6-f7c2-11e3-84a0-1fec295c4c6a AND
      {Venue created datetime > <datetime> 2014-10-23 00:00:00.000 CEST AND Venue created datetime<=<datetime> 2014-10-24 00:00:00.000 CEST}
      AND Deleted == <boolean> false}
      or the similar
      2. Deleted == <boolean> false} AND {{Venue created datetime > <datetime> 2014-10-21 00:00:00.000 CEST AND Venue created datetime<=<datetime> 2014-10-24 00:00:00.000 CEST} AND Owner == <uuid> 60c1dfd6-f7c2-11e3-84a0-1fec295c4c6a AND Instrument == <uuid> 0f9e71de-6942-4615-a8a7-76272ad9d3e3
      

      In 1., we would like to use the index for venue created datetime in conjunction with the Portfolio.
      In 2., we would like to use the index for venue created datetime in conjunction with the Instrument.

      It could be even more indexes that matched, it is not necessarily limited to two.

        Issue Links

          Activity

          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #2317 from wiredtiger/WT-1315-leaks

          WT-1315 Fix some leaks with join cursors.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/16c0a1abe283eb921dd670d64dd7f1520675205f

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #2317 from wiredtiger/ WT-1315 -leaks WT-1315 Fix some leaks with join cursors. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/16c0a1abe283eb921dd670d64dd7f1520675205f
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #2317 from wiredtiger/WT-1315-leaks

          WT-1315 Fix some leaks with join cursors.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/16c0a1abe283eb921dd670d64dd7f1520675205f

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #2317 from wiredtiger/ WT-1315 -leaks WT-1315 Fix some leaks with join cursors. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/16c0a1abe283eb921dd670d64dd7f1520675205f
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'rfmnyc@gmail.com'}

          Message: Import wiredtiger-wiredtiger-mongodb-3.2.0-rc3-177-g9d375e3.tar.gz from wiredtiger branch mongodb-3.2

          ref: d9ec1ff..9d375e3

          16c0a1a WT-1315 Fix some leaks with join cursors.
          59857f9 WT-2222 Add statistics for named snapshots.
          4368d39 WT-1315 Cursor join implementation
          a72ddb7 WT-2218 Add truncate stats
          fb9cebe WT-2224 Track which deleted refs are discarded by a split.
          e2f1130 WT-2220 Split WT_TIMEDIFF macro into unit specific macros.
          be412b5 WT-2182 when internal pages grow large enough, split them into their parents
          ce8c091 WT-2219 Enhancements to in-memory testing
          347d922 WT-2220 time_t cleanup.
          08c0fcd WT-2217 change WT_CURSOR.insert to clear "set" key/value on return
          d1b5e7f WT-2135 Fix log_only setting for backup cursor. Fix initialization.
          78bd4ac WT-2210 raw compression fails if row-store recovery precedes column-store recovery
          c1b2634 WT-2182 fixes for splitting up the tree.
          0a1ee34 WT-2199 Fix transaction sync inconsistency.
          ee31bb2 WT-2182 Simplify the split deepen logic.
          c360d53 WT-2212 Add a "use_environment" config to "wiredtiger_open"
          3f132a4 WT-2182 detect internal page split races.
          Branch: master
          https://github.com/mongodb/mongo/commit/a0771ea5ec1b44537d3c409e3d712db24fd8e6bb

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'rfmnyc@gmail.com'} Message: Import wiredtiger-wiredtiger-mongodb-3.2.0-rc3-177-g9d375e3.tar.gz from wiredtiger branch mongodb-3.2 ref: d9ec1ff..9d375e3 16c0a1a WT-1315 Fix some leaks with join cursors. 59857f9 WT-2222 Add statistics for named snapshots. 4368d39 WT-1315 Cursor join implementation a72ddb7 WT-2218 Add truncate stats fb9cebe WT-2224 Track which deleted refs are discarded by a split. e2f1130 WT-2220 Split WT_TIMEDIFF macro into unit specific macros. be412b5 WT-2182 when internal pages grow large enough, split them into their parents ce8c091 WT-2219 Enhancements to in-memory testing 347d922 WT-2220 time_t cleanup. 08c0fcd WT-2217 change WT_CURSOR.insert to clear "set" key/value on return d1b5e7f WT-2135 Fix log_only setting for backup cursor. Fix initialization. 78bd4ac WT-2210 raw compression fails if row-store recovery precedes column-store recovery c1b2634 WT-2182 fixes for splitting up the tree. 0a1ee34 WT-2199 Fix transaction sync inconsistency. ee31bb2 WT-2182 Simplify the split deepen logic. c360d53 WT-2212 Add a "use_environment" config to "wiredtiger_open" 3f132a4 WT-2182 detect internal page split races. Branch: master https://github.com/mongodb/mongo/commit/a0771ea5ec1b44537d3c409e3d712db24fd8e6bb
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'rfmnyc@gmail.com'}

          Message: Import wiredtiger-wiredtiger-mongodb-3.2.0-rc3-177-g9d375e3.tar.gz from wiredtiger branch mongodb-3.2

          ref: d9ec1ff..9d375e3

          16c0a1a WT-1315 Fix some leaks with join cursors.
          59857f9 WT-2222 Add statistics for named snapshots.
          4368d39 WT-1315 Cursor join implementation
          a72ddb7 WT-2218 Add truncate stats
          fb9cebe WT-2224 Track which deleted refs are discarded by a split.
          e2f1130 WT-2220 Split WT_TIMEDIFF macro into unit specific macros.
          be412b5 WT-2182 when internal pages grow large enough, split them into their parents
          ce8c091 WT-2219 Enhancements to in-memory testing
          347d922 WT-2220 time_t cleanup.
          08c0fcd WT-2217 change WT_CURSOR.insert to clear "set" key/value on return
          d1b5e7f WT-2135 Fix log_only setting for backup cursor. Fix initialization.
          78bd4ac WT-2210 raw compression fails if row-store recovery precedes column-store recovery
          c1b2634 WT-2182 fixes for splitting up the tree.
          0a1ee34 WT-2199 Fix transaction sync inconsistency.
          ee31bb2 WT-2182 Simplify the split deepen logic.
          c360d53 WT-2212 Add a "use_environment" config to "wiredtiger_open"
          3f132a4 WT-2182 detect internal page split races.
          Branch: master
          https://github.com/mongodb/mongo/commit/a0771ea5ec1b44537d3c409e3d712db24fd8e6bb

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'rfmnyc@gmail.com'} Message: Import wiredtiger-wiredtiger-mongodb-3.2.0-rc3-177-g9d375e3.tar.gz from wiredtiger branch mongodb-3.2 ref: d9ec1ff..9d375e3 16c0a1a WT-1315 Fix some leaks with join cursors. 59857f9 WT-2222 Add statistics for named snapshots. 4368d39 WT-1315 Cursor join implementation a72ddb7 WT-2218 Add truncate stats fb9cebe WT-2224 Track which deleted refs are discarded by a split. e2f1130 WT-2220 Split WT_TIMEDIFF macro into unit specific macros. be412b5 WT-2182 when internal pages grow large enough, split them into their parents ce8c091 WT-2219 Enhancements to in-memory testing 347d922 WT-2220 time_t cleanup. 08c0fcd WT-2217 change WT_CURSOR.insert to clear "set" key/value on return d1b5e7f WT-2135 Fix log_only setting for backup cursor. Fix initialization. 78bd4ac WT-2210 raw compression fails if row-store recovery precedes column-store recovery c1b2634 WT-2182 fixes for splitting up the tree. 0a1ee34 WT-2199 Fix transaction sync inconsistency. ee31bb2 WT-2182 Simplify the split deepen logic. c360d53 WT-2212 Add a "use_environment" config to "wiredtiger_open" 3f132a4 WT-2182 detect internal page split races. Branch: master https://github.com/mongodb/mongo/commit/a0771ea5ec1b44537d3c409e3d712db24fd8e6bb
          Hide
          hassila Joakim Hassila added a comment -

          Nice, looking forward to try it out!

          Show
          hassila Joakim Hassila added a comment - Nice, looking forward to try it out!

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 22 weeks, 3 days ago
                Date of 1st Reply: