Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Major - P3
Fix Version/s: WT2.7.0
Affects Version/s: None
Component/s: None
Labels:
- Feature

Trying to capture the fundamental requirements here in generic terms here, would be happy to do even more in-detail practical examples etc. as required.

High-level background

One common pattern we have is to have one primary table with ‘business objects’ (BO) that uses e.g. an ‘UUID’ as the primary key (tweaked to be temporally sortable with a custom collator), and a number of secondary indices (SI-1..SI-n) of various types (UUIDs, custom DateTimes, integers, strings, …).
We need to serve a complex query of data (arbitrary complexity referencing any number of attributes of the BO, range checks, equality checks, with typical logical operators etc.).

Our current implementation makes an analysis of the query to find ‘candidate indexes’ and takes the one with the ‘highest priority’ (as pre-defined by a human at compile time) and uses that to make a rough initial filtering on either the range or equality test relevant for that index, then we post-process the results with our own custom decision diagram filtering (which is also later used for filtering live updates going forward) before returning data to the querying client.

Some of the queries would benefit tremendously of applying multiple indexes simultaneously, as it can e.g. be a combination of date-time range and the user owning a given BO. A simple example would be ’give me all my BO modified in the last week which are owned by user X’, where we have both a date-time and user index available - the individual indexes are both half-bad, but in combination they would yield a very good rough initial filtering.

Multi-index searching

Essentially what we are looking for is inner joins / multi-index searching capabilities, which would allow us to apply multiple indexes to our read cursors, both with range and equality checks using custom comparators/collators. We are primarily looking at AND:ing the secondary index queries.

We discussed one way to implement this ourselves previously (for AND with multiple indexes). The idea was to create a bloom filter based on the primary key for each secondary index except one, merge all of those bloom filters together, and then iterate over the final remaining secondary index and check vs. the aggregated bloom filter for existence. As the Bloom would guarantee no false negatives, we would be able to discard any record returned from the final index scan that did not exist in the Bloom. A false positive we can easily handle in our final post-filter stage anyhow (as we usually would have additional filter criteria on things that are not indexed, we need to do this regardless).

Concrete example (simplified)

We have a business object type called ‘Trade’, which has the following attributes (it is simplified in this example):

FIELD (TYPE)
creator (uuid)
exchange created datetime (datetime)
venue created datetime (datetime)
host created datetime (datetime)
host modified datetime (datetime)
instrument (uuid)
operator (uuid)
order (uuid)
owner (uuid)
portfolio (uuid) 
price (double)
root strategy (uuid)
settlement date (datetime)
side (integer)
trade identifier (uuid)
trade type (integer)
volume (double)
deleted (boolean)

Primary key: trade identifier
Secondary keys: portfolio, venue created datetime, host modified datetime, instrument, root strategy

Typical queries could be similar to:

1. {Portfolio == <uuid> 60c1dfd6-f7c2-11e3-84a0-1fec295c4c6a AND
{Venue created datetime > <datetime> 2014-10-23 00:00:00.000 CEST AND Venue created datetime<=<datetime> 2014-10-24 00:00:00.000 CEST}
AND Deleted == <boolean> false}
or the similar
2. Deleted == <boolean> false} AND {{Venue created datetime > <datetime> 2014-10-21 00:00:00.000 CEST AND Venue created datetime<=<datetime> 2014-10-24 00:00:00.000 CEST} AND Owner == <uuid> 60c1dfd6-f7c2-11e3-84a0-1fec295c4c6a AND Instrument == <uuid> 0f9e71de-6942-4615-a8a7-76272ad9d3e3

In 1., we would like to use the index for venue created datetime in conjunction with the Portfolio.
In 2., we would like to use the index for venue created datetime in conjunction with the Instrument.

It could be even more indexes that matched, it is not necessarily limited to two.

is depended on by

SERVER-21549 WiredTiger changes for 3.2.0-rc4

Closed

Assignee:: Donald Anderson

Reporter:: Joakim Hassila

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: Oct 27 2014 12:39:55 PM UTC

Updated:: Oct 12 2017 11:18:53 PM UTC

Resolved:: Nov 20 2015 02:01:50 AM UTC

Details

Description

High-level background

Multi-index searching

Concrete example (simplified)

Attachments

Issue Links

Activity

People

Dates