Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.14
Component/s: Aggregation Framework
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

The problem is that $limit doesn't affect the amount of documents to be loaded into memory for examination in the aggregation pipeline. For example, check this one:

> db.system.profile.find().limit(1).sort( { ts : -1 } ).pretty()
{
	"op" : "command",
	"ns" : "datasetTreeDB.datasetMappingAnnotation",
	"command" : {
		"aggregate" : "datasetMappingAnnotation",
		"pipeline" : [
			{
				"$match" : {
					"rootNode" : "markingtool",
					"imageCreatedDate" : {
						"$gt" : ISODate("2016-03-27T00:00:00Z")
					},
					"imageMeta__league" : "Major League Baseball"
				}
			},
			{
				"$limit" : 100000
			},
			{
				"$sort" : {
					"randomIndex" : 1
				}
			},
			{
				"$limit" : 1
			},
			{
				"$project" : {
					"_id" : 0,
					"fileName" : 1,
					"rootNode" : 1
				}
			},
			{
				"$count" : "count"
			}
		],
		"allowDiskUse" : true,
		"cursor" : {
			
		}
	},
	"keysExamined" : 100000,
	"docsExamined" : 100000,
	"hasSortStage" : true,
	"cursorExhausted" : true,
	"numYield" : 816,
	"locks" : {
		"Global" : {
			"acquireCount" : {
				"r" : NumberLong(1654)
			}
		},
		"Database" : {
			"acquireCount" : {
				"r" : NumberLong(827)
			}
		},
		"Collection" : {
			"acquireCount" : {
				"r" : NumberLong(826)
			}
		}
	},
	"nreturned" : 1,
	"responseLength" : 130,
	"protocol" : "op_command",
	"millis" : 1346,
	"planSummary" : "IXSCAN { rootNode: 1, imageMeta__league: 1, imageCreatedDate: 1, randomIndex: 1, imageId: 1, secondLevelNode: 1 }",
	"ts" : ISODate("2018-06-28T17:05:57.702Z"),
	"client" : "127.0.0.1",
	"appName" : "MongoDB Shell",
	"allUsers" : [ ],
	"user" : ""
}

I want to get only 1 result from the aggregation and include into this result a field that doesn't exist in the index. The expected behaviour that it will scan 100000 keys for sorting in memory using values from indexes (it makes sense as after "imageCreatedDate" search it can't use further indexes for sorting anymore) and then load and test only 1 first document. However it tried to load all 100000 documents ("docsExamined" : 100000) - it completely kills the performance of this operation.

If do not include the field that doesn't exist in the index, I can see that it doesn't try to fetch any document and can return the result using indexes only:

db.system.profile.find().limit(1).sort( { ts : -1 } ).pretty()
{
	"op" : "command",
	"ns" : "datasetTreeDB.datasetMappingAnnotation",
	"command" : {
		"aggregate" : "datasetMappingAnnotation",
		"pipeline" : [
			{
				"$match" : {
					"rootNode" : "markingtool",
					"imageCreatedDate" : {
						"$gt" : ISODate("2016-03-27T00:00:00Z")
					},
					"imageMeta__league" : "Major League Baseball"
				}
			},
			{
				"$limit" : 100000
			},
			{
				"$sort" : {
					"randomIndex" : 1
				}
			},
			{
				"$limit" : 1
			},
			{
				"$project" : {
					"_id" : 0,
					"rootNode" : 1
				}
			},
			{
				"$count" : "count"
			}
		],
		"allowDiskUse" : true,
		"cursor" : {
			
		}
	},
	"keysExamined" : 100000,
	"docsExamined" : 0,
	"hasSortStage" : true,
	"cursorExhausted" : true,
	"numYield" : 785,
	"locks" : {
		"Global" : {
			"acquireCount" : {
				"r" : NumberLong(1586)
			}
		},
		"Database" : {
			"acquireCount" : {
				"r" : NumberLong(793)
			}
		},
		"Collection" : {
			"acquireCount" : {
				"r" : NumberLong(792)
			}
		}
	},
	"nreturned" : 1,
	"responseLength" : 130,
	"protocol" : "op_command",
	"millis" : 182,
	"planSummary" : "IXSCAN { rootNode: 1, imageMeta__league: 1, imageCreatedDate: 1, randomIndex: 1, imageId: 1, secondLevelNode: 1 }",
	"ts" : ISODate("2018-06-28T17:16:10.787Z"),
	"client" : "127.0.0.1",
	"appName" : "MongoDB Shell",
	"allUsers" : [ ],
	"user" : ""
}

These ones are synthetic example and in real environment I have ~100.000.000 items in collection and my goal is to use indexes at the "$match" step, then "$sort" (like in this example) and then return only first 100.000 items ($limit with 100.000). However if "$match" step returns 5.000.000 items it tries to load and examine all 5.000.000 documents and only then return top 100.000.

Is there a way how I can fix this behaviour?

Thank you!

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

fastRequest.json
Jun 28 2018 08:58:25 PM UTC
6 kB
Alexey [X]
simple_example.js
Jul 02 2018 11:33:36 PM UTC
2 kB
Alexey [X]
slowRequest.json
Jun 28 2018 08:58:25 PM UTC
7 kB
Alexey [X]

Assignee:: Nick Brewer (Inactive)

Reporter:: Alexey [X]

Participants:: Alexey [X], Nick Brewer

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: Jun 28 2018 05:24:05 PM UTC

Updated:: Oct 27 2023 01:53:41 PM UTC

Resolved:: Aug 20 2018 10:27:17 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates