Details

    • Type: New Feature New Feature
    • Status: Open Open
    • Priority: Major - P3 Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Planning Bucket A
    • Component/s: Indexing
    • Labels:
      None
    • # Replies:
      31
    • Last comment by Customer:
      false

      Description

      Support filtered indexes where only some values are indexed. These are also called partial indexes.

      For more informations here is wikipedia:
      http://en.wikipedia.org/wiki/Partial_index

        Issue Links

          Activity

          Hide
          Thomas Rueckstiess
          added a comment - - edited

          A very general approach to define such filtered indexes is to use the aggregation pipeline language (or similar) to pre-process documents before indexing.

          Example:

          For documents like this

          { 
              username : "john",
              accountEnabled : true,
              age : 33,
              likes : [ "golf", "tennis"] 
          }
          

          A filtered index could be defined with:

          ensureIndex( 
              { likes: 1 },  
              [ 
                  { $match : { accountEnabled : true, age : { $gte: 18 } } }, 
                  { $project : { likes : 1 } }, 
                  { $unwind : "$likes" } 
              ] 
          )
          

          This would create an index entry for each "likes" element where the $match conditions are fulfilled. Specifically, it would create the index on each output document of the aggregation pipeline. Future inserts and updates to documents would have to be piped through the pipeline as well to modify the index.

          The $match operator here implicitly defines which documents get indexed, and therefore allows to create selective sparse indexes (see SERVER-13780, which would then be redundant).

          Here an example for a custom sparse indexing policy with this approach. Assume a sparse index on {a:1, b:1}. Currently, MongoDB indexes a document for a compound sparse index as soon as any field is present. If one wanted to implement a policy where all fields of the compound sparse index need to be present, this could be achieved with:

          ensureIndex( 
              {a: 1, b: 1}, 
              [ { $match : { a: {$exists: true}, b: {$exists: true} } } ] 
          )
          

          This index would only contain documents where both a and b are present in the document.

          Further definition is required how exactly $project, $unwind etc operators affect indexing.

          Show
          Thomas Rueckstiess
          added a comment - - edited A very general approach to define such filtered indexes is to use the aggregation pipeline language (or similar) to pre-process documents before indexing. Example: For documents like this { username : "john", accountEnabled : true, age : 33, likes : [ "golf", "tennis"] } A filtered index could be defined with: ensureIndex( { likes: 1 }, [ { $match : { accountEnabled : true, age : { $gte: 18 } } }, { $project : { likes : 1 } }, { $unwind : "$likes" } ] ) This would create an index entry for each "likes" element where the $match conditions are fulfilled. Specifically, it would create the index on each output document of the aggregation pipeline. Future inserts and updates to documents would have to be piped through the pipeline as well to modify the index. The $match operator here implicitly defines which documents get indexed, and therefore allows to create selective sparse indexes (see SERVER-13780 , which would then be redundant). Here an example for a custom sparse indexing policy with this approach. Assume a sparse index on { a:1, b:1 }. Currently, MongoDB indexes a document for a compound sparse index as soon as any field is present. If one wanted to implement a policy where all fields of the compound sparse index need to be present, this could be achieved with: ensureIndex( {a: 1, b: 1}, [ { $match : { a: {$exists: true}, b: {$exists: true} } } ] ) This index would only contain documents where both a and b are present in the document. Further definition is required how exactly $project, $unwind etc operators affect indexing.
          Hide
          Glenn Maynard
          added a comment -

          I'm not sure that using aggregation helps. Aggregation is a complex system that's designed for (by definition) aggregating multiple documents, and this is only dealing with a single document at a time.

          I'd just supply a query, so that only documents that match the query are included. That makes it easy to understand the filter, to figure out if a document change needs to update an index, and to reason about deciding whether an index can be used for a query or not.

          Show
          Glenn Maynard
          added a comment - I'm not sure that using aggregation helps. Aggregation is a complex system that's designed for (by definition) aggregating multiple documents, and this is only dealing with a single document at a time. I'd just supply a query, so that only documents that match the query are included. That makes it easy to understand the filter, to figure out if a document change needs to update an index, and to reason about deciding whether an index can be used for a query or not.
          Hide
          Asya Kamsky
          added a comment -

          Glenn Maynard aggregation has $project phase which allows transformation which may be needed to test for conditions that cannot be expressed via simple query - so it's not just for aggregating multiple documents (as in group phase) it can be used for transforming document fields/shapes, or for aggregating multiple fields in a document. This would allow wider range of possible expressions than simple query.

          Show
          Asya Kamsky
          added a comment - Glenn Maynard aggregation has $project phase which allows transformation which may be needed to test for conditions that cannot be expressed via simple query - so it's not just for aggregating multiple documents (as in group phase) it can be used for transforming document fields/shapes, or for aggregating multiple fields in a document. This would allow wider range of possible expressions than simple query.
          Hide
          Glenn Maynard
          added a comment -

          If it's useful to filter documents using aggregation features in ways that regular queries can't, the solution is to extend the query language to support it, not to use two different things for the same task (saying whether a document meets a condition or not).

          Show
          Glenn Maynard
          added a comment - If it's useful to filter documents using aggregation features in ways that regular queries can't, the solution is to extend the query language to support it, not to use two different things for the same task (saying whether a document meets a condition or not).
          Hide
          Thomas Rueckstiess
          added a comment - - edited

          The aggregation language would additionally allow to define expression indexes, where one could define the values stored in the index. Perhaps it makes sense to keep these two concepts separate for now. We've created SERVER-14784 to track expression index support and continue to use this ticket to track filtered indexes.

          For filtered indexes, it is sufficient to use a query language to determine if the document should be included in the index. Whether or not we will implement the two concepts under a unified language or interface is yet to be determined.

          Show
          Thomas Rueckstiess
          added a comment - - edited The aggregation language would additionally allow to define expression indexes , where one could define the values stored in the index. Perhaps it makes sense to keep these two concepts separate for now. We've created SERVER-14784 to track expression index support and continue to use this ticket to track filtered indexes . For filtered indexes, it is sufficient to use a query language to determine if the document should be included in the index. Whether or not we will implement the two concepts under a unified language or interface is yet to be determined.

            People

            • Votes:
              139 Vote for this issue
              Watchers:
              117 Start watching this issue

              Dates

              • Created:
                Updated:
                Days since reply:
                30 weeks ago
                Date of 1st Reply: