[SERVER-33526] Count performance when using a join Created: 27/Feb/18  Updated: 23/Apr/18  Resolved: 09/Mar/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.6.3
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: henry clifford Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Please note this isn't https://jira.mongodb.org/browse/SERVER-31760

Performing a `count` on a largish collection doesn't complete in what would be considered a reasonable time. Running this query won't complete (> 10 mins before giving up).
Venues is 500k objects, Tasks is 15mil. Indexes are in place to support this query.

db.venues.aggregate(
	[
		{
			$match: {
			
			}
		},
		{
			$lookup: {
			    "from" : "tasks",
			    "as" : "foreign",
			    let: { localField: "$_id"},
			    "pipeline":[
						{
							"$match":{
								"$expr":{
									"$eq":["$v","$$localField"]
								}
							}
						},
						{"$limit":1},
						{"$count":"count"}
					]
			}
		},
		{
			$match: {"foreign.count":1}
		},
		{
			$count: "count"
		},
	],
 
	// Options
	{
		cursor: {
			batchSize: 50
		}
	}
);



 Comments   
Comment by Kelsey Schubert [ 22/Mar/18 ]

Hi henry@live.xyz,

From atlas, you should be able to check the cache activity and disk usage graphs. If you're aware of when these queries are executed and the system is not noisy, you may be able to infer whether it is coming from memory.

Kind regards,
Kelsey

Comment by henry clifford [ 15/Mar/18 ]

just following up on this, thanks!

Comment by henry clifford [ 09/Mar/18 ]

given that this is a "covered" query (I'm only returning information that can be derived from the index), there's no COLLSCAN's as a result and the index is in-memory this shouldn't actually require any disk use? Is there any way we can confirm if it's coming from memory? thanks!

Running this against atlas still takes 40s. Is this working as intended?

Comment by Kelsey Schubert [ 06/Mar/18 ]

Hi henry@live.xyz,

Thanks for uploading the diagnostic.data and indexStats. From this additional information, I suspect that the issue is that disk on your macbook is the bottleneck of this operation and you're seeing improved behavior after warming up the cache. We can see that it is using an index to support the aggregation framework, and consequently, it would appear that the correct query plan is being selected. For best performance, please note that we do not recommend OS X for production use.

Kind regards,
Kelsey

Comment by henry clifford [ 06/Mar/18 ]

No worries. Yup!

I'm expecting `329927` venues to match (drawn from `db.tasks.distinct("v").length`) in a non-limited query, I re-ran with a limit on the venues of 10,000.

Before

{ "name" : "c_1", "key" : { "c" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "co.f_1", "key" : { "co.f" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "n_1", "key" : { "n" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "d_1", "key" : { "d" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(8476), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "_id_", "key" : { "_id" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "v_1", "key" : { "v" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(705412), "since" : ISODate("2018-02-27T18:25:41.597Z") } }

After

{ "name" : "c_1", "key" : { "c" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "co.f_1", "key" : { "co.f" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "n_1", "key" : { "n" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "d_1", "key" : { "d" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(8477), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "_id_", "key" : { "_id" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2018-02-27T18:25:41.597Z") } }
{ "name" : "v_1", "key" : { "v" : 1 }, "host" : "henrys-MacBook-Pro.local:27017", "accesses" : { "ops" : NumberLong(715412), "since" : ISODate("2018-02-27T18:25:41.597Z") } }

diagnostics uploaded

edit: reruns without a limit are giving me more reasonable results (100 seconds on the last run). Still a bit out there for a trivial lookup, but is completing.

Comment by Kelsey Schubert [ 05/Mar/18 ]

Hi henry@live.xyz,

Sorry for the delay getting back to you. Are you able to reproduce this issue in isolation? If so, I have a requests for diagnostic information, which will help us track down the problem:

Prior to running the please save the output of

db.tasks.aggregate( [ { $indexStats: { } } ] )

After letting the aggregation lookup command complete (feel free to add a limit in the initial match of venue, but let us know how many documents from venue you expect to match):
Please provide the output of again

db.tasks.aggregate( [ { $indexStats: { } } ] )

In addition, would you please upload an archive of the diagnostic.data directory after running this aggregation command? I've created a secure portal for you to use to provide the files. When you upload the diagnostic.data, please specify the exact time that the aggregation was executed.

Thank you,
Kelsey

Generated at Thu Feb 08 04:33:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.