Investigate if $idLookup optimization should use getUserLimit() instead of limit->getLimit()

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When applying a limit optimization, DocSourceVectorSearch uses getUserLimit (here) but DocSourceIdLookup just looks for a $limit stage and grabs its value (here).

      I'm wondering if we have a bug in $idLookup because getUserLimit() also checks for $skips. If you have [\{$vectorSearch}, \{$skip: 5}, \{$limit: 10}], the limit pushed down is 15 since you do need 15 documents. Pushing down limit 10 would cause problems if you apply the skip to your stream of 10 documents and end up only getting 5 back.

      Today we desugar on the shards here, after (most-or-all?) pipeline optimization has happened, so I'm wondering if the $idLookup doOptimizeAt() is never really called since it isn't around during optimization. That means if this is a bug, it's likely we wouldn't have caught itWe are about to change the order of desugaring so I'm worried this may manifest as a bug in extension $vectorSearch

      In this ticket we should first investigate/test if this can lead to any correctness issues, then apply a fix

            Assignee:
            Alyssa Clark
            Reporter:
            Will Buerger
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: