[SERVER-18476] In-progress queries may return incorrect results if an index concurrently becomes multikey Created: 14/May/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 1
Labels: storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-4975 covered index projection may be impro... Closed
is duplicated by SERVER-17678 IndexScan doesn't dedup if index beco... Closed
Related
related to SERVER-41058 Add concurrency workload targeted at ... Backlog
Assigned Teams:
Query Execution
Participants:

 Description   

Query planning logic depends substantially on whether or not the set of available indices is multikey, and on which indexed paths are multikey. If this information changes due to a concurrent update or insert creating a new array path, in progress queries may be using a plan that is now invalid. One way to fix this would be to kill all in-progress queries for a particular collection when making a change to that collection's multikeyness metadata.



 Comments   
Comment by Eric Milkie [ 08/Jun/18 ]

The query system relies on the multikey flag to prohibit certain optimizations when scanning indexes that would result in "actually incorrect" results. One example is from index_bounds_builder.cpp:

        // If the index is multikey, it doesn't matter what the tightness of the child is, we must
        // return INEXACT_FETCH. Consider a multikey index on 'a' with document {a: [1, 2, 3]} and
        // query {a: {$ne: 3}}.  If we treated the bounds [MinKey, 3), (3, MaxKey] as exact, then we
        // would erroneously return the document!

Comment by Asya Kamsky [ 08/Jun/18 ]

Technically are the results actually incorrect or only incorrect if the operation was relying on "point-in-time" semantics?

Comment by David Storch [ 25/Jan/16 ]

Moving this work from "3.3 Required" to "Planning Bucket A", since the user impact of unexpected transitions to multikey is small and there are some tricky technical obstacles. Specifically, the technical obstacles are:

  • A multi-update can cause a transition to multikey, thereby killing itself. We would have to create a special mechanism to prevent the multi-update plan itself from getting killed, rather than simply using Collection::invalidateAll() to kill all queries currently active in the collection.
  • Index catalog level operations (e.g. index drops) are currently the only callers of Collection::invalidateAll(), and these operations fail with a user assertion if there is an in-progress background index build. This is because Collection::invalidateAll() would kill the collection scan used to implement the background index build. It is not trivial to allow transitions to multikey to safely occur during a background index build, but it also is not ok for an update or insert operation to fail simply because it caused a multikeyness transition during a background index build.

CC milkie

Generated at Thu Feb 08 03:47:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.