Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.4.0, 4.4.1
Component/s: Querying, Sharding
Labels:
- count
- performance
- qexec-team
- sharded
Environment:
MongoDB community 4.4 on AWS EC2.

Sprint:
Query 2020-10-05, Query 2020-10-19, Query 2020-11-02
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

There is a huge performance regression when doing a simple count on an array of string in 4.4 (also in 4.4.1).

I don't know if this is a 4.4 issue or if it appeared earlier because we are in the process of migration from 3.4 to 4.4. Still, here the complete description of the issue.

We have a "test" collection of 2M documents, which almost all have a field "labels" like that (among a lot of other fields, these are quite large documents) :

{"labels": ["aaaaa", "bbbbb", "ccccc"]}
{"labels": ["ddddd"]}
{"labels": []}
{"labels": ["aaaaa", "ccccc"]}

* There are about 120+ different values in the "labels" field

Some values are present only once, some hundreds of thousands of times.
The field "labels" has a simple index (not sparse nor partial)
In the sharded environment, the collection is correctly balanced according to sh.balancerCollectionStatus

Running a "distinct" is very fast and seems to be using the index, whichever the test environment is (3.4, 4.4, sharded or not).

db.test.distinct("labels");
[
	"aaaaa",
	"bbbbb",
	"ccccc",
	...
]

But when it comes to counting....

Mongod 3.4, unsharded collection:

db.test.count({"labels": "aaaaa"}); --- 1 occurrence, 0,1s
db.test.count({"labels": "ccccc"}); --- 430000 occurrences, 0,3s
db.test.count({"labels": "NON EXISTANT VALUE"}); --- 0 occurrence, 0,1s

Mongod 3.4, sharded collection:

db.test.count({"labels": "aaaaa"}); --- 1 occurrence, 0,1s
db.test.count({"labels": "ccccc"}); --- 430000 occurrences, 0,2s
db.test.count({"labels": "NON EXISTANT VALUE"}); --- 0 occurrence, 0,1s

Mongod 4.4.1, sharded collection:

db.test.count({"labels": "aaaaa"}); --- 1 occurence, 0,1s
db.test.count({"labels": "ccccc"}); --- 430000 occurrences, 10+s << PROBLEM IS HERE
db.test.count({"labels": "NON EXISTANT VALUE"}); --- 0 occurence, 0,1s

Mongod 4.4.1, unsharded collection:

db.test.count({"labels": "aaaaa"}); --- 1 occurrence, 0,1s
db.test.count({"labels": "ccccc"}); --- 430000 occurrences, 0,3s
db.test.count({"labels": "NON EXISTANT VALUE"}); --- 0 occurrence, 0,1s

I am attaching the complete explains for you to see the behavior in the different environments, but as you can see, in the 4.4 sharded collection, each shard does an IXSCAN (60ms), then a FETCH (10s) then a SHARDING FILTER, etc. and what takes time is the fetching and later stages. It does take advantages of the index at all, whereas the single shard / not sharded version does.

This has a HUGE impact on performance, and doing that with an aggregate is slow as well (because of the $unwind then $group pattern). In our application, because the aggregate is slow to do that operation (36s), it's WAY faster in 3.4 to use distinct then a count of each value (takes less than a second even in the sharded env with the 2M documents and 120 differents values).

This also happens with non-multikey fields / index (as seen in the attached files).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

explain-4.4.1-sharded-multikey.json
21 kB
Sep 10 2020 02:22:44 PM UTC
explain-4.4.1-sharded-1.json
21 kB
Sep 10 2020 02:22:44 PM UTC
explain-4.2.7-sharded.json
2 kB
Sep 11 2020 08:27:00 AM UTC
explain-3.4-unsharded-1.json
5 kB
Sep 10 2020 02:22:44 PM UTC
explain-3.4-sharded-1.json
18 kB
Sep 10 2020 02:22:44 PM UTC

related to

SERVER-3645 Sharded collection counts (on primary) can report too many results

Closed

SERVER-39191 Performance regression for counts post-sharding

Closed

Assignee:: Ian Boros
Reporter:: Henri-Maxime Ducoulombier
Participants:: Eric Sedor, Henri-Maxime Ducoulombier, Ian Boros
Votes:: 2 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Sep 10 2020 02:24:16 PM UTC
Updated:: Nov 02 2020 03:17:31 PM UTC
Resolved:: Nov 02 2020 03:17:30 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates