Details

    • Type: Epic
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.1 Required
    • Component/s: Indexing
    • Labels:
      None
    • Epic Name:
      Partial Indexes
    • Documentation Changes:
      Needed
    • Driver Changes:
      Not Needed
    • User Summary:
      Not Needed

      Description

      Support filtered indexes where only some values are indexed. These are also called partial indexes.

      For more informations here is wikipedia:
      http://en.wikipedia.org/wiki/Partial_index

        Issue Links

          Issues in Epic

            Activity

            Hide
            thomasr Thomas Rueckstiess added a comment - - edited

            A very general approach to define such filtered indexes is to use the aggregation pipeline language (or similar) to pre-process documents before indexing.

            Example:

            For documents like this

            { 
                username : "john",
                accountEnabled : true,
                age : 33,
                likes : [ "golf", "tennis"] 
            }

            A filtered index could be defined with:

            ensureIndex( 
                { likes: 1 },  
                [ 
                    { $match : { accountEnabled : true, age : { $gte: 18 } } }, 
                    { $project : { likes : 1 } }, 
                    { $unwind : "$likes" } 
                ] 
            )

            This would create an index entry for each "likes" element where the $match conditions are fulfilled. Specifically, it would create the index on each output document of the aggregation pipeline. Future inserts and updates to documents would have to be piped through the pipeline as well to modify the index.

            The $match operator here implicitly defines which documents get indexed, and therefore allows to create selective sparse indexes (see SERVER-13780, which would then be redundant).

            Here an example for a custom sparse indexing policy with this approach. Assume a sparse index on {a:1, b:1}. Currently, MongoDB indexes a document for a compound sparse index as soon as any field is present. If one wanted to implement a policy where all fields of the compound sparse index need to be present, this could be achieved with:

            ensureIndex( 
                {a: 1, b: 1}, 
                [ { $match : { a: {$exists: true}, b: {$exists: true} } } ] 
            )

            This index would only contain documents where both a and b are present in the document.

            Further definition is required how exactly $project, $unwind etc operators affect indexing.

            Show
            thomasr Thomas Rueckstiess added a comment - - edited A very general approach to define such filtered indexes is to use the aggregation pipeline language (or similar) to pre-process documents before indexing. Example: For documents like this { username : "john", accountEnabled : true, age : 33, likes : [ "golf", "tennis"] } A filtered index could be defined with: ensureIndex( { likes: 1 }, [ { $match : { accountEnabled : true, age : { $gte: 18 } } }, { $project : { likes : 1 } }, { $unwind : "$likes" } ] ) This would create an index entry for each "likes" element where the $match conditions are fulfilled. Specifically, it would create the index on each output document of the aggregation pipeline. Future inserts and updates to documents would have to be piped through the pipeline as well to modify the index. The $match operator here implicitly defines which documents get indexed, and therefore allows to create selective sparse indexes (see SERVER-13780 , which would then be redundant). Here an example for a custom sparse indexing policy with this approach. Assume a sparse index on { a:1, b:1 }. Currently, MongoDB indexes a document for a compound sparse index as soon as any field is present. If one wanted to implement a policy where all fields of the compound sparse index need to be present, this could be achieved with: ensureIndex( {a: 1, b: 1}, [ { $match : { a: {$exists: true}, b: {$exists: true} } } ] ) This index would only contain documents where both a and b are present in the document. Further definition is required how exactly $project, $unwind etc operators affect indexing.
            Hide
            glenn Glenn Maynard added a comment -

            I'm not sure that using aggregation helps. Aggregation is a complex system that's designed for (by definition) aggregating multiple documents, and this is only dealing with a single document at a time.

            I'd just supply a query, so that only documents that match the query are included. That makes it easy to understand the filter, to figure out if a document change needs to update an index, and to reason about deciding whether an index can be used for a query or not.

            Show
            glenn Glenn Maynard added a comment - I'm not sure that using aggregation helps. Aggregation is a complex system that's designed for (by definition) aggregating multiple documents, and this is only dealing with a single document at a time. I'd just supply a query, so that only documents that match the query are included. That makes it easy to understand the filter, to figure out if a document change needs to update an index, and to reason about deciding whether an index can be used for a query or not.
            Hide
            asya Asya Kamsky added a comment -

            Glenn Maynard aggregation has $project phase which allows transformation which may be needed to test for conditions that cannot be expressed via simple query - so it's not just for aggregating multiple documents (as in group phase) it can be used for transforming document fields/shapes, or for aggregating multiple fields in a document. This would allow wider range of possible expressions than simple query.

            Show
            asya Asya Kamsky added a comment - Glenn Maynard aggregation has $project phase which allows transformation which may be needed to test for conditions that cannot be expressed via simple query - so it's not just for aggregating multiple documents (as in group phase) it can be used for transforming document fields/shapes, or for aggregating multiple fields in a document. This would allow wider range of possible expressions than simple query.
            Hide
            glenn Glenn Maynard added a comment -

            If it's useful to filter documents using aggregation features in ways that regular queries can't, the solution is to extend the query language to support it, not to use two different things for the same task (saying whether a document meets a condition or not).

            Show
            glenn Glenn Maynard added a comment - If it's useful to filter documents using aggregation features in ways that regular queries can't, the solution is to extend the query language to support it, not to use two different things for the same task (saying whether a document meets a condition or not).
            Hide
            thomasr Thomas Rueckstiess added a comment - - edited

            The aggregation language would additionally allow to define expression indexes, where one could define the values stored in the index. Perhaps it makes sense to keep these two concepts separate for now. We've created SERVER-14784 to track expression index support and continue to use this ticket to track filtered indexes.

            For filtered indexes, it is sufficient to use a query language to determine if the document should be included in the index. Whether or not we will implement the two concepts under a unified language or interface is yet to be determined.

            Show
            thomasr Thomas Rueckstiess added a comment - - edited The aggregation language would additionally allow to define expression indexes , where one could define the values stored in the index. Perhaps it makes sense to keep these two concepts separate for now. We've created SERVER-14784 to track expression index support and continue to use this ticket to track filtered indexes . For filtered indexes, it is sufficient to use a query language to determine if the document should be included in the index. Whether or not we will implement the two concepts under a unified language or interface is yet to be determined.

              People

              • Votes:
                139 Vote for this issue
                Watchers:
                117 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Days since reply:
                  33 weeks, 4 days ago
                  Date of 1st Reply: