Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78585

Multikey indexes will not cover past the array even when the array attribute is not projected

    • Type: Icon: Bug Bug
    • Resolution: Community Answered
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 7.0.0-rc6
    • Component/s: None
    • Labels:
      None
    • Storage Execution
    • ALL
    • Hide

      Insert the following documents into a collection:

      { _id: "P1", data: "P1data", relatedTo: "P1" }

      ,

      { _id: "P2", data: "P2data", relatedTo: "P2" }

      ,

      { _id: "C1", data: "C1data", relatedTo: ["P1", "P2"] }

      ,

      { _id: "C2", data: "C2data", relatedTo: ["P1", "P2"] }

       

      Add the following index:

      createIndex(

      { "relatedTo": 1, "data": 1 }

      )

       

      Note that following query is not covered by the index even though data is the only attribute projected:
      find({relatedTo: "P1"}, {"_id": 0, data:1})

       

      Create a new collection and add the following documents:

      { _id: "P1", data: "P1data", relatedTo: "P1" }

      ,

      { _id: "P2", data: "P2data", relatedTo: "P2" }

      ,

      { _id: "C1", data: "C1data", relatedTo: "P1" }

      ,

      { _id: "C1.1", data: "C1data", relatedTo: "P2"]}

      ,

      { _id: "C2", data: "C2data", relatedTo: "P1" }

      ,

      { _id: "C2.1", data: "C2data", relatedTo: "P2" }

       

      Add the following index:

      createIndex(

      { "relatedTo": 1, "data": 1 }

      )

      Note that the following query will be covered:
      find({relatedTo: "P1"}, {"_id": 0, data:1})

      Show
      Insert the following documents into a collection: { _id: "P1", data: "P1data", relatedTo: "P1" } , { _id: "P2", data: "P2data", relatedTo: "P2" } , { _id: "C1", data: "C1data", relatedTo: ["P1", "P2"] } , { _id: "C2", data: "C2data", relatedTo: ["P1", "P2"] }   Add the following index: createIndex( { "relatedTo": 1, "data": 1 } )   Note that following query is not covered by the index even though data is the only attribute projected: find({relatedTo: "P1"}, {"_id": 0, data:1})   Create a new collection and add the following documents: { _id: "P1", data: "P1data", relatedTo: "P1" } , { _id: "P2", data: "P2data", relatedTo: "P2" } , { _id: "C1", data: "C1data", relatedTo: "P1" } , { _id: "C1.1", data: "C1data", relatedTo: "P2"]} , { _id: "C2", data: "C2data", relatedTo: "P1" } , { _id: "C2.1", data: "C2data", relatedTo: "P2" }   Add the following index: createIndex( { "relatedTo": 1, "data": 1 } ) Note that the following query will be covered: find({relatedTo: "P1"}, {"_id": 0, data:1})

      When an array attribute is defined as part of a compound index it is not possible to cover queries for attributes on the index that extend past the multikey value even when the multikey attribute itself is not projected. As a result, to cover queries with compound indexes users must manually flatten the array attributes by denormalizing their data across multiple documents when attributes that extend beyond a multikey value are required (see example below).

      OLTP workloads are often write heavy and will use data in small bits and pieces. Efficiently modeling this data often requires small documents that can be easily grouped together for complex reads using indexes on attributes containing references to other related documents in the same collection. Indexing these reference attributes currently forces a fetch if the attributes contain more than one value, reducing the overall efficiency and performance of the system for this large class of OLTP workloads. To avoid the fetch users must denormalize their data, introducing potential for data anomalies and adding overhead for writes and updates.

      Another common access pattern that is negatively impacted would be range queries on attributes that could otherwise be filtered by a multikey attribute condition. Consider a collection of evets that have multiple tags. In order to find events that occurred within a time range a compound index must be created on "created, tag" instead of "tag, created". Since the tag attribute is an array the fetch will be executed before the date range filter is applied. This creates an inefficient index lookup as the date condition must be applied before the tag condition which mandates a fetch. Leading the index with tag forces all docs matching that tag to be fetched then filtered by date which would in all likelihood be even worse unless the tag values are sparse.

            Assignee:
            Unassigned Unassigned
            Reporter:
            rick.houlihan@mongodb.com Rick Houlihan
            Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: