[SERVER-50857] Improve count() performance in sharded clusters Created: 10/Sep/20  Updated: 02/Nov/20  Resolved: 02/Nov/20

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: 4.4.0, 4.4.1
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Henri-Maxime Ducoulombier Assignee: Ian Boros
Resolution: Won't Fix Votes: 2
Labels: count, performance, qexec-team, sharded
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoDB community 4.4 on AWS EC2.


Attachments: File explain-3.4-sharded-1.json     File explain-3.4-unsharded-1.json     File explain-4.2.7-sharded.json     File explain-4.4.1-sharded-1.json     File explain-4.4.1-sharded-multikey.json    
Issue Links:
Related
related to SERVER-3645 Sharded collection counts (on primary... Closed
related to SERVER-39191 Performance regression for counts pos... Closed
Sprint: Query 2020-10-05, Query 2020-10-19, Query 2020-11-02
Participants:

 Description   

There is a huge performance regression when doing a simple count on an array of string in 4.4 (also in 4.4.1).

I don't know if this is a 4.4 issue or if it appeared earlier because we are in the process of migration from 3.4 to 4.4. Still, here the complete description of the issue.

We have a "test" collection of 2M documents, which almost all have a field "labels" like that (among a lot of other fields, these are quite large documents) :

{"labels": ["aaaaa", "bbbbb", "ccccc"]}
{"labels": ["ddddd"]}
{"labels": []}
{"labels": ["aaaaa", "ccccc"]}

* There are about 120+ different values in the "labels" field

  • Some values are present only once, some hundreds of thousands of times.
  • The field "labels" has a simple index (not sparse nor partial)
  • In the sharded environment, the collection is correctly balanced according to sh.balancerCollectionStatus

Running a "distinct" is very fast and seems to be using the index, whichever the test environment is (3.4, 4.4, sharded or not).

db.test.distinct("labels");
[
	"aaaaa",
	"bbbbb",
	"ccccc",
	...
]

But when it comes to counting....

Mongod 3.4, unsharded collection:

db.test.count({"labels": "aaaaa"}); --- 1 occurrence, 0,1s
db.test.count({"labels": "ccccc"}); --- 430000 occurrences, 0,3s
db.test.count({"labels": "NON EXISTANT VALUE"}); --- 0 occurrence, 0,1s

Mongod 3.4, sharded collection:

db.test.count({"labels": "aaaaa"}); --- 1 occurrence, 0,1s
db.test.count({"labels": "ccccc"}); --- 430000 occurrences, 0,2s
db.test.count({"labels": "NON EXISTANT VALUE"}); --- 0 occurrence, 0,1s

Mongod 4.4.1, sharded collection:

db.test.count({"labels": "aaaaa"}); --- 1 occurence, 0,1s
db.test.count({"labels": "ccccc"}); --- 430000 occurrences, 10+s << PROBLEM IS HERE
db.test.count({"labels": "NON EXISTANT VALUE"}); --- 0 occurence, 0,1s

Mongod 4.4.1, unsharded collection:

db.test.count({"labels": "aaaaa"}); --- 1 occurrence, 0,1s
db.test.count({"labels": "ccccc"}); --- 430000 occurrences, 0,3s
db.test.count({"labels": "NON EXISTANT VALUE"}); --- 0 occurrence, 0,1s

I am attaching the complete explains for you to see the behavior in the different environments, but as you can see, in the 4.4 sharded collection, each shard does an IXSCAN (60ms), then a FETCH (10s) then a SHARDING FILTER, etc. and what takes time is the fetching and later stages. It does take advantages of the index at all, whereas the single shard / not sharded version does.

This has a HUGE impact on performance, and doing that with an aggregate is slow as well (because of the $unwind then $group pattern). In our application, because the aggregate is slow to do that operation (36s), it's WAY faster in 3.4 to use distinct then a count of each value (takes less than a second even in the sharded env with the 2M documents and 120 differents values).

This also happens with non-multikey fields / index (as seen in the attached files).



 Comments   
Comment by Ian Boros [ 02/Nov/20 ]

Looks like I missed your response. I'm glad this worked!

Closing as "Won't Fix."

Comment by Henri-Maxime Ducoulombier [ 22/Oct/20 ]

Hi Ian Boros,

Thank you for your answer.

I had a good rtfm session on the readConcern setting (which I wasn't aware of yet), tested the workaround immediately and it worked!

I reproduced the application behavior, listing then counting distinct values on a 5 nodes shard and the run time goes down from 35 to just about 6 seconds.

Regarding the orpheans, this is not an issue in our use case because of the balancing window, presplitting and hashed shard index stragegies, but we have to keep that in mind when using the "available" read concern.

Here is a sample test script, just in case.

let labels = db.test.distinct('labels');
 
let rc =  {level: "available" }; // "local" or "available"
for (let label of labels){
    let count = db.runCommand(
       {
         count: "test",
         query: {"labels": label},
         readConcern: rc
       }
    )['n'];
    print(`${label}: ${count}`);
}

I believe this ticket can be closed, the workaround is a fine solution.

Comment by Ian Boros [ 21/Oct/20 ]

Hi hmducoulombier@marketing1by1.com,

I believe you can work around this issue by using readConcern "available." Under this readConcern, the shard filtering stage will not be added, so the query should use the COUNT_SCAN plan. I must emphasize that without shard filtering, the results you get from the server may be inaccurate. At the end of this comment I am providing some example code on how to do this.

Unfortunately, after some discussion with other members of the team, we see no way to get the same performance as 3.4 while also providing accurate results. With that in mind, I intend to close this ticket as "Won't Fix." I'll leave it open for now in case you have follow up questions.

Here's the example code. The explain command does not support readConcern "available" so to get the complete plan information, I had to turn on profiling and read it out of system.profile. I'm looking into this more.

(function() {
 
// Deliberately inserts orphans outside of migration.
TestData.skipCheckOrphans = true;
 
const st = new ShardingTest({shards: 2});
const shard0Coll = st.shard0.getCollection("test.coll");
const numDocs = 10;
const middle = numDocs / 2;
const numOrphans = 2;
 
function getNthDocument(n) {
    return {_id: n, indexedField: 1, x: n};
}
 
// Shard the collection. Shard 0 will get keys from [0, middle) and shard 1 will get everything
// from [middle, numDocs).
assert.commandWorked(st.s.getDB("admin").runCommand({enableSharding: "test"}));
st.ensurePrimaryShard("test", st.shard0.name);
st.shardColl(shard0Coll.getName(), {x: 1}, {x: middle}, {x: middle + 1}, "test", true);
 
// Insert some docs.
for (let i = 0; i < numDocs; i++) {
    assert.commandWorked(st.getDB("test").coll.insert(getNthDocument(i)));
}
 
// Insert some orphan documents to shard 0. These are just documents outside the range
// which shard 0 owns.
for (let i = middle + 1; i < middle + 1 + numOrphans; i++) {
    assert.commandWorked(shard0Coll.insert(getNthDocument(i)));
}
 
st.getDB("test").coll.createIndex({indexedField: 1});
    
// Enable profiling on the shards.
st.shard0.getDB("test").setProfilingLevel(2, {slowms: 1, sampleRate: 1.0});
st.shard1.getDB("test").setProfilingLevel(2, {slowms: 1, sampleRate: 1.0});
 
// Run a count command with readConcern "available."
const countRes = assert.commandWorked(st.getDB("test").runCommand(
    {count: "coll", query: {indexedField: 1}, readConcern: {level: "available"}}));
assert.eq(countRes.n, numDocs + numOrphans);
 
// Dump the contents of the profiler. The planSummary field should indicate that a COUNT_SCAN is used.
print("Profiler contents: " + tojson(st.shard0.getDB("test").system.profile.find().toArray()));
print("Profiler contents: " + tojson(st.shard1.getDB("test").system.profile.find().toArray()));
 
st.stop();
})();

And the plan used:

"execStats" : {
	"stage" : "COUNT",
	"nReturned" : 0,
	"executionTimeMillisEstimate" : 0,
	"works" : 6,
	"advanced" : 0,
	"needTime" : 5,
	"needYield" : 0,
	"saveState" : 0,
	"restoreState" : 0,
	"isEOF" : 1,
	"nCounted" : 5,
	"nSkipped" : 0,
	"inputStage" : {
		"stage" : "COUNT_SCAN",
		"nReturned" : 5,
		"executionTimeMillisEstimate" : 0,
		"works" : 6,
		"advanced" : 5,
		"needTime" : 0,
		"needYield" : 0,
		"saveState" : 0,
		"restoreState" : 0,
		"isEOF" : 1,
		"keysExamined" : 6,
		"keyPattern" : {
			"indexedField" : 1
		},
		"indexName" : "indexedField_1",
		"isMultiKey" : false,
		"multiKeyPaths" : {
			"indexedField" : [ ]
		},
		"isUnique" : false,
		"isSparse" : false,
		"isPartial" : false,
		"indexVersion" : 2,
		"indexBounds" : {
			"startKey" : {
				"indexedField" : 1
			},
			"startKeyInclusive" : true,
			"endKey" : {
				"indexedField" : 1
			},
			"endKeyInclusive" : true
		}
	}
}

Also, to answer your question:

Following your comment, this raises a question in my mind: do indexes include orphean documents or don't they ?

Yes, indexes do include orphan documents.

Comment by Henri-Maxime Ducoulombier [ 16/Oct/20 ]

In my case, the predicate provided does not include the shard key.

In the examples, the "labels" fields is indexed but the shard key is "contact_id" for instance, and the index on labels is only on labels and is not a compound of labels + contact_id, meaning the fix in SERVER-39191 wouldn't help.

Following your comment, this raises a question in my mind: do indexes include orphean documents or don't they ?

 

Comment by Ian Boros [ 29/Sep/20 ]

This is also related to the regression reported in SERVER-39191. In SERVER-39191, however, the predicate which is provided to the count() command includes the entire shard key, so the query planner could (but does not as of 4.7) omit the shard filter and still return correct results.

Comment by Henri-Maxime Ducoulombier [ 25/Sep/20 ]

Thank you for your answer and for pointing out the other issue, which is absolutely related.

The solution to make a compound index seems viable at the cost of a much larger index, and we will test it out soon enough, even though the use of the simple index by the query planner would be way nicer.

Comment by Eric Sedor [ 24/Sep/20 ]

Thanks for your patience, hmducoulombier@marketing1by1.com, and for your thorough report.

I believe this is because of SERVER-3645. A SHARDING_FILTER is required for accurate counts, and must FETCH to obtain the shard key value for documents being returned out of the index.

I am going to pass this ticket on for confirmation and consideration, as an improvement request.

Comment by Henri-Maxime Ducoulombier [ 11/Sep/20 ]

I just build a 4.2 shard and it has the same behavior. Here is a sample of the explain with executionStats.

{
	"shardName" : "rs29802",
	"executionSuccess" : true,
	"executionStages" : {
		"stage" : "COUNT",
		"nReturned" : 0,
		"executionTimeMillisEstimate" : 10905,
		"works" : 72516,
		"advanced" : 0,
		"needTime" : 72515,
		"needYield" : 0,
		"saveState" : 833,
		"restoreState" : 833,
		"isEOF" : 1,
		"nCounted" : 72515,
		"nSkipped" : 0,
		"inputStage" : {
			"stage" : "SHARDING_FILTER",
			"nReturned" : 72515,
			"executionTimeMillisEstimate" : 10900,
			"works" : 72516,
			"advanced" : 72515,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 833,
			"restoreState" : 833,
			"isEOF" : 1,
			"chunkSkips" : 0,
			"inputStage" : {
				"stage" : "FETCH",
				"nReturned" : 72515,
				"executionTimeMillisEstimate" : 10767,
				"works" : 72516,
				"advanced" : 72515,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 833,
				"restoreState" : 833,
				"isEOF" : 1,
				"docsExamined" : 72515,
				"alreadyHasObj" : 0,
				"inputStage" : {
					"stage" : "IXSCAN",
					"nReturned" : 72515,
					"executionTimeMillisEstimate" : 32,
					"works" : 72516,
					"advanced" : 72515,
					"needTime" : 0,
					"needYield" : 0,
					"saveState" : 833,
					"restoreState" : 833,
					"isEOF" : 1,
					"keyPattern" : {
						"labels" : 1
					},
					"indexName" : "labels-idx",
					"isMultiKey" : true,
					"multiKeyPaths" : {
						"labels" : [
							"labels"
						]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"labels" : [
							"[\"aaaaa\", \"aaaaa\"]"
						]
					},
					"keysExamined" : 72515,
					"seeks" : 1,
					"dupsTested" : 72515,
					"dupsDropped" : 0
				}
			}
		}
	}
}

 

Comment by Henri-Maxime Ducoulombier [ 10/Sep/20 ]

As far as I understand the doc, the count() method wraps the 2 countDocuments() and estimatedDocumentCount().

Still, the aggregation pipeline should use the index in the sharded collection, and not fetch + merge (imho)

Comment by Henri-Maxime Ducoulombier [ 10/Sep/20 ]

Edit: this is not on array fields only.

Generated at Thu Feb 08 05:23:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.