[SERVER-59910] Queries with sort+projection+limit fail unnecessarily because the sort happens before projection Created: 13/Sep/21  Updated: 13/Sep/21  Resolved: 13/Sep/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.16, 4.0.27
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Benjamin Smedberg Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-26442 Push $sort before $project and $addFi... Open
Operating System: ALL
Steps To Reproduce:

I haven't yet created a minimized testcase script, but I hope to have one later today.

Participants:

 Description   

I have a collection which contains large documents (often 10k or more), but I'm trying to make a query and return a very small subset of this data.

Imagine a collection with documents shaped like this:

{
  _id: ObjectId(<random>),
  "lineItems": [
    { quantity: 10, partId: 12345, unitPrice: 1034, productionDocuments: [], ...},
    ... potentially hundreds of lines here
  ],
  totalPrice: 113470,
  acceptedPartner: "P11780",
  forwardedToPartner: "P1234",
  acceptedDate: Date("2021-08-15"),
}

When I run a query like this:

db.jobs.find(
  { $or: [
      { acceptedPartner: "P11780" },
      { forwardedToPartner: "P11780" },
  ] },
  {
    _id: true,
    totalPrice: true,
    acceptedPartner: true,
    forwardedToPartner: true,
    acceptedDate: true,
  }
).sort({ acceptedDate: -1 }).limit(20)

I get an error message about running out of memory for sort and having to use disk sort. However, this query should not run out of memory if properly projected (I do have separate indexes on both acceptedPartner and forwardedToPartner). The explain plan produced by the planner indicates that the database is selecting using the proper indexes, sorting, and then projecting. The correct query plan would be to filter using the indexes, project, combine, and then sort, which would not run out of memory space.

 

I have verified that the incorrect query plan is present in both 4.0 and 4.2. I have yet to try 4.4 but I didn't see anything in the 4.4 release notes that would indicate a fix for this issue.



 Comments   
Comment by Kelsey Schubert [ 13/Sep/21 ]

Hi bsmedberg@xometry.com,

Thanks for the report. I believe this issue is tracked in SERVER-26442. Please feel free to watch SERVER-26442 for updates.

Kind regards,
Kelsey

Generated at Thu Feb 08 05:48:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.