Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.6.2, 2.7.2
Affects Version/s: 2.6.1
Component/s: Querying
Labels:
None

Operating System:
ALL
Backport Completed:

2.6.2
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of Jun 13, 2014

ISSUE SUMMARY
For historical reasons, the wire protocol does not pass separate values for limit() and batchSize(), but rather passes the smaller of the two in a single ntoreturn field. The consequence is that the server does not know whether the ntoreturn value is a batchSize or a limit.

Usually this doesn't matter much, but it matters a lot if there is a sort. Suppose that the server receives an ntoreturn of 10 with a sort():

If this is limit(10), then we can find the top 10 using a top-K sort. This requires buffering only 10 results in memory.
If this is a batchSize(10), then we have to do a full sort. This requires buffering all matching documents in memory.

In order to cope with this, the server creates query plans which behave as follows:

First they do a top-K sort and return the first 10 results. If the client means limit(10), it will be happy and close the cursor.
If the client means batchSize(10), then it will ask for new batches with a getmore. At this point, the query plan falls back to doing a full sort.

The problem with this strategy is that plans are worked until they finish or produce about 100 results by the ranker. The plans will thus move on to the "do a full sort" phase. This results in buffering more data than necessary if the client means limit() and not batchSize().

USER IMPACT
Queries with an unindexed sort() and a limit() may buffer more documents than required, causing the query to hit the blocking sort memory limit and fail.

WORKAROUNDS
Hinting or adding an index that provides the sort() may be a suitable workaround depending on the data and the query, but the indexed sort plan may not be efficient for everyone.

AFFECTED VERSIONS
MongoDB production releases 2.6.0 and 2.6.1 are affected by this issue.

FIX VERSION
The fix is included in the 2.6.2 production release.

RESOLUTION DETAILS
Treat ntoreturn as though it is a limit during plan ranking to avoid buffering too much data during a limit() with a sort(). If either a sort() or limit() is specified, plan ranking now ceases as soon as any query plan returns min(ntoreturn, 101) results. This prevents the query from switching from the top-K phase to the in-memory sort phase.

Original description

The query plan selection seems to be ignoring the limit specified while evaluating the multiple plans that may be usable for a specific query. In this instance the query has sort() and limit(), however the issue may be much wider than that. The diagnosis is based on the logLevel:2 logs. Plan selection is not invoked in following cases and the query works fine:

When hint is specified explicitly - everyone is happy
When there is only one matching index - leading to it being used
When there are no matching indices leading to full collection scan
When there is a perfect match index - leading to it being used

Steps to reproduce the issue:

Generate big sample records (or even small would work if the query result exceeds the 32MB size)

bigStr = "ABCDEFGHIJKLMNBOPQRSTUVWXYZ012345687890";
// Generate big string to use in the object - 1MB+ String
while (bigStr.length < 1000000) { bigStr = bigStr + "::" + bigStr; }

for (i = 0; i < 400; i++) { 
  doc = {x: i % 2, y: i % 3, other: i % 3, xval: 1, yval: i %10, name: bigStr} ;
  if (i % 3) { 
    doc.tags = ["test" + i % 10]; 
  }
  
  db.sortf.insert(doc);
}

Create two or more indices that can match the query / sorting criteria

db.sortf.ensureIndex({ "x" : 1, "y" : 1, "tags" : 1, "other" : -1 })
db.sortf.ensureIndex({ "x" : 1, "xval" : 1 })

Perform any of the following queries without hint and with hint (any index or $natural). The queries without limit should fail if the sort buffer exceeds, but that should never be the case for limit queries which are within the bounds (i.e. here limit of 10 would mean only about 13-14 MB of data).

db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).limit(1).sort({other: -1})
db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).limit(10).sort({other: -1})
db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).sort({other: -1})
db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).sort({y: 1})
db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).sort({y: -1})
db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).sort({y: -1})

Failure log snippet for a query plan selection:

2014-06-04T15:07:16.200-0400 [conn5] Running query: query: { x: 1.0, y: 1.0, other: { $gte: 0.0 } } sort: { other: -1 } projection: {}
2014-06-04T15:07:16.201-0400 [conn5] Relevant index 0 is kp: { y: 1.0, ProductCode: 1.0 } io: { v: 1, key: { y: 1.0, ProductCode: 1.0 }, name: "y_1_ProductCode_1", ns: "test.sortf" }
2014-06-04T15:07:16.201-0400 [conn5] Relevant index 1 is kp: { y: 1.0, x: 1.0, tags: 1.0, other: -1.0 } io: { v: 1, key: { y: 1.0, x: 1.0, tags: 1.0, other: -1.0 }, name: "y_1_x_1_tags_1_other_-1", ns: "test.sortf" }
2014-06-04T15:07:16.201-0400 [conn5] Relevant index 0 is kp: { other: -1 } multikey
2014-06-04T15:07:16.202-0400 [conn5] Relevant index 0 is kp: { other: -1 } multikey
2014-06-04T15:07:16.212-0400 [conn5] Relevant index 0 is kp: { other: -1 } multikey
2014-06-04T15:07:16.213-0400 [conn5] Relevant index 0 is kp: { other: -1 } multikey
2014-06-04T15:07:16.215-0400 [conn5] User Assertion: 17144:Runner error: Overflow sort stage buffered data usage of 33607400 bytes exceeds internal limit of 33554432 bytes
2014-06-04T15:07:16.216-0400 [conn5] assertion 17144 Runner error: Overflow sort stage buffered data usage of 33607400 bytes exceeds internal limit of 33554432 bytes ns:test.sortf query:{ query: { x: 1.0, y: 1.0, other: { $gte: 0.0 } }, orderby: { other: -1.0 } } 
2014-06-04T15:07:16.216-0400 [conn5]  ntoskip:0 ntoreturn:12014-06-04T15:07:16.216-0400 [conn5] query test.sortf query: { query: { x: 1.0, y: 1.0, other: { $gte: 0.0 } }, orderby: { other: -1.0 } } ntoreturn:1 keyUpdates:0 exception: Runner error: Overflow sort stage buffered data usage of 33607400 bytes exceeds internal limit of 33554432 bytes code:17144 numYields:0 locks(micros) r:15266 reslen:158 15ms
2014-06-04T15:07:16.217-0400 [conn5] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
2014-06-04T15:07:16.218-0400 [conn5] command: { replSetGetStatus: 1.0, forShell: 1.0 }

related to

SERVER-14228 Setting batchSize and sort on a cursor in sharded collection causes fewer than all documents to be returned

Closed

Assignee:: David Storch
Reporter:: Anil Kumar (Inactive)
Participants:: Anil Kumar, David Storch, Githook User, Noam Berda
Votes:: 2 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Jun 05 2014 01:49:40 PM UTC
Updated:: Jul 11 2016 05:16:40 PM UTC
Resolved:: Jun 06 2014 03:57:03 PM UTC
Confidence Status Last Update:: 05/Jun/14 6:00 PM

Details

Description

Original description

Attachments

Issue Links

Forms

Activity

People

Dates