Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-99881

Incorrect results for queries using top-K sort before fetch

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • ALL

      Consider a collection with index on {a: 1, b: 1} and the query 

      find({a: {$gt: 5}}).sort({b: 1}).limit(5) 

      The optimizer will produce the following plan:

      {
          stage: 'FETCH',
          inputStage: {
            stage: 'SORT',
            sortPattern: { b: 1 },
            limitAmount: 5,
            inputStage: {
              stage: 'IXSCAN',
              keyPattern: { a: 1, b: 1 },
              indexBounds: {a: [ '(5, inf.0]' ], b: [ '[MinKey, MaxKey]' ]}
            }
          }
      } 

      Normally, the execution will look like this:

      The index scan will produce many index keys + RIDs, suppose it is much larger than 5. The sort stage will block and perform a top-K sort over the keys to get a stream of 5 sorted index keys and throw away the rest of them. Then the fetch stage will look up each RID to get the corresponding document.

      Suppose the plan executor returns 2 documents and then yields. Then a delete occurs which deletes a document corresponding to one of the RIDs in the sort stage’s buffer. The plan executor resumes and when it tries to fetch that RID it will fail and move on to the next. This results in the plan returning one fewer document than the limit specified.

      This is potentially problematic because when the user specified a limit of N and the query returned < N, the user should be able to assume that there < N documents matching the predicate, but that isn’t true in this case.

      There are main options:

      1. Consider this as “works as intended” and argue that if a user wants this guarantee, they must use a higher isolation level.
      2. Push the fetch below the sort stage to avoid this problem. This likely will result in a performance regression due to more memory used in the sort.

      Full repro script attached: sort_limit_repro.js

       

       

            Assignee:
            Unassigned Unassigned
            Reporter:
            ben.shteinfeld@mongodb.com Ben Shteinfeld
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: