[SERVER-30342] Slow performance of $project in aggregation pipeline in 3.4.6 Created: 26/Jul/17  Updated: 15/Aug/17  Resolved: 01/Aug/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Performance
Affects Version/s: 3.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: S?awomir Nowak Assignee: Charlie Swanson
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-23408 Use a smarter batch size for small sa... Backlog
Related
is related to SERVER-22402 Add dependency tracking to the $match... Closed
Operating System: ALL
Participants:

 Description   

We have the following collection, with 1773732 documents:
COMPANIES:

{
    "_id" : "",
    "companyId" : "aCompanyId",
    "name": "aCompany"
    "status" : "ACTIVE",
    "description" : "a description",
    "departments" : [ 
        {
            "departmentId" : "dep1"
           
        }
    ],
     "createdOn" : NumberLong(1475667096236),
    "modifiedOn" : NumberLong(1490715174421),
    "version" : NumberLong(2)
}

Now we have a query that takes 20 random companies, filters out inactive ones, performs basic projection and then picks random 10 elements once again (let's assume we have more than 50% of active companies)

db.companies.aggregate([
    {"$sample": {"size": 20}},
    {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},
    {"$project": {"postcodeId": 1, "postTown": 1, "description": 1}},
    {"$sample": {"size": 10}}
])

The following query works fine (don't have detailed data, but I'd say p95 <200ms) on mongodb 3.2

However, we've noticed serious performance problems after upgrade to 3.4.6. Execution times are an order of magnitude higher than on 3.2 (it takes ~1.5-2s to complete).

This is what I saw in mongodb logs on 3.4:

2017-07-26T14:36:36.256+0000 I COMMAND  [conn542] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0 } }, { $sample: { size: 10.0 } } ], explain: false, cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:104 nreturned:10 reslen:1181 locks:{ Global: { acquireCount: { r: 218 } }, Database: { acquireCount: { r: 109 } }, Collection: { acquireCount: { r: 108 } } } protocol:op_command 1190ms

As you can see, there is a lot of locking going on. On 3.2, same query produces about ~15 global read locks. What is also strange, if I get rid of $project and fetch all the data, latencies are fine.

However, when I reverse the logic of the projection, so it becomes:

db.companies.aggregate([
    {"$sample": {"size": 20}},
    {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},
    {"$project": {"departments": 0, "createdOn": 0 ... (exclude every single field)}},
    {"$sample": {"size": 10}}]
)

it works as fast as on 3.2 and there number of global locks also decreases.

Additional information:

  • cluster configuration: 3-node replica set
  • indexes: companyId, departments.departmentId

Should I consider it a bug? Or maybe I'm missing something?



 Comments   
Comment by Charlie Swanson [ 01/Aug/17 ]

slnowak, I'm closing this as a duplicate of SERVER-23408, as I believe that would resolve your issue.

You should be able to confirm this by setting the internal parameter 'internalDocumentSourceCursorBatchSizeBytes' to something lower and confirming that these queries speed up.

const batchBytes = 1024;
db.adminCommand({setParameter: 1, internalDocumentSourceCursorBatchSizeBytes: batchBytes});

Note that I would not necessarily recommend setting this parameter in production, as it will impact all aggregations, and very possibly make others slower.

I'd recommend you watch SERVER-23408 for future updates.

Thanks,
Charlie

Comment by Charlie Swanson [ 26/Jul/17 ]

asya that's an excellent question. I think it's probably due to SERVER-22402. It looks like in version 3.2 we didn't know what fields the $match stage was going to need, so we needed to request all of them.

Comment by Asya Kamsky [ 26/Jul/17 ]

If the batch is back to 4MBs why would 3.4.6 be slower than 3.2 then?

Comment by S?awomir Nowak [ 26/Jul/17 ]

Any chances SERVER-23408 or 27829 will be fixed in 3.4.x or rather 3.6?

Comment by Charlie Swanson [ 26/Jul/17 ]

Yep, that's correct. There's two related tickets here that we'd like to address: SERVER-23408 (which I mentioned earlier), or SERVER-27829 which might solve this problem more generally for stages other than $sample.

I'll note that I was mistaken earlier though - it was 16MB on 3.4.0 and 3.4.1, but it looks like SERVER-27406 brought it back to only 4MB in 3.4.2.

Comment by S?awomir Nowak [ 26/Jul/17 ]

So correct me if I'm wrong - you're trying to fill up that 16mb batch even if I ask for sample of one element, containing only one field? And thus, mongo is effectively looking for (16mb/size-of-simple-projection) documents?

Comment by Charlie Swanson [ 26/Jul/17 ]

slnowak, if my hypothesis is correct, here's what's happening:

On version 3.4, we'll fill up a batch of 16MB worth of data before returning anything to the client. This has a weird interaction with our dependency tracking system, which will try to remove the fields that you don't need from the documents. The fewer fields you need, the smaller the documents will be. With smaller documents, we'll have to do more random walks down the index structures to find results.

Exclusion projections are a bit different, since our dependency tracking system doesn't take advantage of them in the same way. Currently, we will look for a finite set of fields that we know are the only ones used by the pipeline. In your examples with an inclusion projection, we know those are the only fields needed. If there's an exclusion projection, the set of fields needed is infinite, we just know which ones you don't need. We probably could implement some optimization where will would explicitly skip these earlier in the pipeline, but we haven't done so today. So with an exclusion projection, the documents filling the 16MB buffer are still the full documents. With an inclusion projection, the documents filling the 16MB buffer only include the fields mentioned in the projection.

It's a little subtle, but I believe this is probably what you're running into.

Comment by Asya Kamsky [ 26/Jul/17 ]

Plan for collection with data:

// 3.2 
{
	"waitedMS" : NumberLong(0),
	"stages" : [
		{
			"$cursor" : {
				"query" : {
 
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.companies",
					"indexFilterSet" : false,
					"winningPlan" : {
						"stage" : "FETCH",
						"inputStage" : {
							"stage" : "INDEX_ITERATOR"
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$sampleFromRandomCursor" : {
				"size" : NumberLong(20)
			}
		},
		{
			"$match" : {
				"$or" : [
					{
						"status" : "ACTIVE"
					},
					{
						"status" : "$exists"
					}
				]
			}
		},
		{
			"$project" : {
				"postcodeId" : true,
				"postTown" : true,
				"description" : true
			}
		},
		{
			"$sample" : {
				"size" : NumberLong(10)
			}
		}
	],
	"ok" : 1
}
 
// 3.4
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
 
				},
				"fields" : {
					"description" : 1,
					"postTown" : 1,
					"postcodeId" : 1,
					"status" : 1,
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.companies",
					"indexFilterSet" : false,
					"winningPlan" : {
						"stage" : "MULTI_ITERATOR"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$sampleFromRandomCursor" : {
				"size" : NumberLong(20)
			}
		},
		{
			"$match" : {
				"$or" : [
					{
						"status" : "ACTIVE"
					},
					{
						"status" : "$exists"
					}
				]
			}
		},
		{
			"$project" : {
				"_id" : true,
				"description" : true,
				"postTown" : true,
				"postcodeId" : true
			}
		},
		{
			"$sample" : {
				"size" : NumberLong(10)
			}
		}
	],
	"ok" : 1,
	"operationTime" : Timestamp(1501098642, 2035)
}

Comment by S?awomir Nowak [ 26/Jul/17 ]

And what's the explanation for the exclusion case? The latency is fine when I'm trying to implicitly project only specific fields (by excluding everything else)

Comment by Charlie Swanson [ 26/Jul/17 ]

It looks like you might be running into SERVER-23408. Note that we increased the default batch size from 4MB to 16MB on 3.4, so this could be why it's getting slower when upgrading.

asya, the optimization of sample becoming a random tree walk only happens when the collection exists, so can you re-run the explain with an actual collection?

Comment by S?awomir Nowak [ 26/Jul/17 ]

I had to obfuscate logs a little bit, but the idea is the same.
The more data I tried to fetch from server, the faster it is. "Department" field represents an array of object with a lot of properties. It seems like when I include it in a projection, it slightly reduces the number of locks.

db.companies.aggregate([
    {"$sample": {"size": 20}},
    {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},
    {"$project": {"companyId": 1}},
    {"$sample": {"size": 10}}]
)
 
2017-07-26T19:15:47.008+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:135 nreturned:10 reslen:752 locks:{ Global: { acquireCount: { r: 280 } }, Database: { acquireCount: { r: 140 } }, Collection: { acquireCount: { r: 139 } } } protocol:op_command 1647ms
2017-07-26T19:15:49.005+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:131 nreturned:10 reslen:752 locks:{ Global: { acquireCount: { r: 272 } }, Database: { acquireCount: { r: 136 } }, Collection: { acquireCount: { r: 135 } } } protocol:op_command 1520ms
2017-07-26T19:15:51.125+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:131 nreturned:10 reslen:754 locks:{ Global: { acquireCount: { r: 272 } }, Database: { acquireCount: { r: 136 } }, Collection: { acquireCount: { r: 135 } } } protocol:op_command 1511ms
2017-07-26T19:15:52.992+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:130 nreturned:10 reslen:750 locks:{ Global: { acquireCount: { r: 270 } }, Database: { acquireCount: { r: 135 } }, Collection: { acquireCount: { r: 134 } } } protocol:op_command 1438ms
2017-07-26T19:15:54.999+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:131 nreturned:10 reslen:751 locks:{ Global: { acquireCount: { r: 272 } }, Database: { acquireCount: { r: 136 } }, Collection: { acquireCount: { r: 135 } } } protocol:op_command 1491ms
2017-07-26T19:15:56.985+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:132 nreturned:10 reslen:751 locks:{ Global: { acquireCount: { r: 274 } }, Database: { acquireCount: { r: 137 } }, Collection: { acquireCount: { r: 136 } } } protocol:op_command 1560ms
2017-07-26T19:15:58.958+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:135 nreturned:10 reslen:750 locks:{ Global: { acquireCount: { r: 280 } }, Database: { acquireCount: { r: 140 } }, Collection: { acquireCount: { r: 139 } } } protocol:op_command 1545ms
 
 
db.companies.aggregate([
    {"$sample": {"size": 20}},
    {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},
    {"$project": {"companyId": 1, "name": 1, "description": 1}},
    {"$sample": {"size": 10}}]
)
 
2017-07-26T19:10:52.706+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:92 nreturned:10 reslen:1480 locks:{ Global: { acquireCount: { r: 194 } }, Database: { acquireCount: { r: 97 } }, Collection: { acquireCount: { r: 96 } } } protocol:op_command 1136ms
2017-07-26T19:10:54.106+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:92 nreturned:10 reslen:1459 locks:{ Global: { acquireCount: { r: 194 } }, Database: { acquireCount: { r: 97 } }, Collection: { acquireCount: { r: 96 } } } protocol:op_command 991ms
2017-07-26T19:10:55.820+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:95 nreturned:10 reslen:1480 locks:{ Global: { acquireCount: { r: 200 } }, Database: { acquireCount: { r: 100 } }, Collection: { acquireCount: { r: 99 } } } protocol:op_command 1116ms
2017-07-26T19:10:57.349+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:92 nreturned:10 reslen:1490 locks:{ Global: { acquireCount: { r: 194 } }, Database: { acquireCount: { r: 97 } }, Collection: { acquireCount: { r: 96 } } } protocol:op_command 1061ms
2017-07-26T19:10:58.965+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:93 nreturned:10 reslen:1471 locks:{ Global: { acquireCount: { r: 196 } }, Database: { acquireCount: { r: 98 } }, Collection: { acquireCount: { r: 97 } } } protocol:op_command 1197ms
2017-07-26T19:11:00.323+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:94 nreturned:10 reslen:1517 locks:{ Global: { acquireCount: { r: 198 } }, Database: { acquireCount: { r: 99 } }, Collection: { acquireCount: { r: 98 } } } protocol:op_command 1155ms
2017-07-26T19:11:02.052+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:94 nreturned:10 reslen:1514 locks:{ Global: { acquireCount: { r: 198 } }, Database: { acquireCount: { r: 99 } }, Collection: { acquireCount: { r: 98 } } } protocol:op_command 1170ms
 
db.companies.aggregate([
    {"$sample": {"size": 20}},
    {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},
    {"$project": {"companyId": 1, "name": 1, "description": 1, "departments": 1}},
    {"$sample": {"size": 10}}]
)
 
2017-07-26T19:12:35.507+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0, departments: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:134840 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 23ms
2017-07-26T19:12:36.167+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0, departments: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:216443 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 27ms
2017-07-26T19:12:36.826+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0, departments: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:2 nreturned:10 reslen:155636 locks:{ Global: { acquireCount: { r: 14 } }, Database: { acquireCount: { r: 7 } }, Collection: { acquireCount: { r: 6 } } } protocol:op_command 38ms
2017-07-26T19:12:37.506+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0, departments: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:143475 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 24ms
2017-07-26T19:12:38.066+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0, departments: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:151309 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 21ms
2017-07-26T19:12:38.791+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0, departments: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:176909 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 19ms
2017-07-26T19:12:39.514+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0, description: 1.0, departments: 1.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:88417 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 21ms
 
db.companies.aggregate([
    {"$sample": {"size": 20}},
    {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},
    {"$project": {"departments": 0, "geoData": 0, "status": 0, "createdOn": 0, "modifiedOn": 0, "version": 0}},
    {"$sample": {"size": 10}}]
)
 
2017-07-26T19:16:53.976+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { departments: 0.0, geoData: 0.0, status: 0.0, createdOn: 0.0, modifiedOn: 0.0, version: 0.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:1428 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 22ms
2017-07-26T19:16:54.408+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { departments: 0.0, geoData: 0.0, status: 0.0, createdOn: 0.0, modifiedOn: 0.0, version: 0.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:1451 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 22ms
2017-07-26T19:16:54.735+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { departments: 0.0, geoData: 0.0, status: 0.0, createdOn: 0.0, modifiedOn: 0.0, version: 0.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:1448 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 20ms
2017-07-26T19:16:55.191+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { departments: 0.0, geoData: 0.0, status: 0.0, createdOn: 0.0, modifiedOn: 0.0, version: 0.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:1462 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 17ms
2017-07-26T19:16:55.694+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { departments: 0.0, geoData: 0.0, status: 0.0, createdOn: 0.0, modifiedOn: 0.0, version: 0.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:10 reslen:1488 locks:{ Global: { acquireCount: { r: 10 } }, Database: { acquireCount: { r: 5 } }, Collection: { acquireCount: { r: 4 } } } protocol:op_command 20ms
2017-07-26T19:16:56.122+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { departments: 0.0, geoData: 0.0, status: 0.0, createdOn: 0.0, modifiedOn: 0.0, version: 0.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:2 nreturned:10 reslen:1479 locks:{ Global: { acquireCount: { r: 14 } }, Database: { acquireCount: { r: 7 } }, Collection: { acquireCount: { r: 6 } } } protocol:op_command 35ms
2017-07-26T19:16:56.561+0000 I COMMAND  [conn557] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { departments: 0.0, geoData: 0.0, status: 0.0, createdOn: 0.0, modifiedOn: 0.0, version: 0.0 } }, { $sample: { size: 10.0 } } ], cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:1 nreturned:10 reslen:1490 locks:{ Global: { acquireCount: { r: 12 } }, Database: { acquireCount: { r: 6 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 20ms

Comment by Asya Kamsky [ 26/Jul/17 ]

Here's the explain for the aggregation in 3.2:

db.companies.aggregate([     {"$sample": {"size": 20}},     {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},     {"$project": {"postcodeId": 1, "postTown": 1, "description": 1}},     {"$sample": {"size": 10}} ],{explain:true})
{
	"waitedMS" : NumberLong(0),
	"stages" : [
		{
			"$cursor" : {
				"query" : {
 
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.companies",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"$and" : [ ]
					},
					"winningPlan" : {
						"stage" : "EOF"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$sample" : {
				"size" : NumberLong(20)
			}
		},
		{
			"$match" : {
				"$or" : [
					{
						"status" : "ACTIVE"
					},
					{
						"status" : "$exists"
					}
				]
			}
		},
		{
			"$project" : {
				"postcodeId" : true,
				"postTown" : true,
				"description" : true
			}
		},
		{
			"$sample" : {
				"size" : NumberLong(10)
			}
		}
	],
	"ok" : 1
}

Here is the same for 3.5 (same in 3.4)

db.companies.aggregate([     {"$sample": {"size": 20}},     {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},     {"$project": {"postcodeId": 1, "postTown": 1, "description": 1}},     {"$sample": {"size": 10}} ],{explain:true})
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
 
				},
				"fields" : {
					"description" : 1,
					"postTown" : 1,
					"postcodeId" : 1,
					"status" : 1,
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.companies",
					"indexFilterSet" : false,
					"parsedQuery" : {
 
					},
					"winningPlan" : {
						"stage" : "EOF"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$sample" : {
				"size" : NumberLong(20)
			}
		},
		{
			"$match" : {
				"$or" : [
					{
						"status" : "ACTIVE"
					},
					{
						"status" : "$exists"
					}
				]
			}
		},
		{
			"$project" : {
				"_id" : true,
				"description" : true,
				"postTown" : true,
				"postcodeId" : true
			}
		},
		{
			"$sample" : {
				"size" : NumberLong(10)
			}
		}
	],
	"ok" : 1
}

Note that we pushed the projection into the query layer in 3.4 where we didn't used to do that in 3.2. I can't find the ticket the work might have been done under.

Comment by S?awomir Nowak [ 26/Jul/17 ]

Yes, I've executed it several times, on two separate replicasets and it has always been slower in 3.4 when included projection different than "exclude these specific fields" (this was introduced in 3.4 afaik). My impression is that the less data I want to fetch, the slower the query, witch seems irrational.

Do you want any specific cases?

Comment by Asya Kamsky [ 26/Jul/17 ]

Is the aggregation in 3.4 slow consistently? Can you include a log file or more examples of aggregation with different $project options and each one run several times?

Generated at Thu Feb 08 04:23:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.