Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.6
Component/s: Aggregation Framework, Performance
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have the following collection, with 1773732 documents:
COMPANIES:

{
    "_id" : "",
    "companyId" : "aCompanyId",
    "name": "aCompany"
    "status" : "ACTIVE",
    "description" : "a description",
    "departments" : [ 
        {
            "departmentId" : "dep1"
           
        }
    ],
     "createdOn" : NumberLong(1475667096236),
    "modifiedOn" : NumberLong(1490715174421),
    "version" : NumberLong(2)
}

Now we have a query that takes 20 random companies, filters out inactive ones, performs basic projection and then picks random 10 elements once again (let's assume we have more than 50% of active companies)

db.companies.aggregate([
    {"$sample": {"size": 20}},
    {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},
    {"$project": {"postcodeId": 1, "postTown": 1, "description": 1}},
    {"$sample": {"size": 10}}
])

The following query works fine (don't have detailed data, but I'd say p95 <200ms) on mongodb 3.2

However, we've noticed serious performance problems after upgrade to 3.4.6. Execution times are an order of magnitude higher than on 3.2 (it takes ~1.5-2s to complete).

This is what I saw in mongodb logs on 3.4:

2017-07-26T14:36:36.256+0000 I COMMAND  [conn542] command Whatever.companies command: aggregate { aggregate: "companies", pipeline: [ { $sample: { size: 20.0 } }, { $match: { $or: [ { status: "ACTIVE" }, { status: "$exists" } ] } }, { $project: { companyId: 1.0, name: 1.0 } }, { $sample: { size: 10.0 } } ], explain: false, cursor: {} } planSummary: MULTI_ITERATOR keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:104 nreturned:10 reslen:1181 locks:{ Global: { acquireCount: { r: 218 } }, Database: { acquireCount: { r: 109 } }, Collection: { acquireCount: { r: 108 } } } protocol:op_command 1190ms

As you can see, there is a lot of locking going on. On 3.2, same query produces about ~15 global read locks. What is also strange, if I get rid of $project and fetch all the data, latencies are fine.

However, when I reverse the logic of the projection, so it becomes:

db.companies.aggregate([
    {"$sample": {"size": 20}},
    {"$match": {"$or": [{"status": "ACTIVE"}, {"status": "$exists"}]}},
    {"$project": {"departments": 0, "createdOn": 0 ... (exclude every single field)}},
    {"$sample": {"size": 10}}]
)

it works as fast as on 3.2 and there number of global locks also decreases.

Additional information:

cluster configuration: 3-node replica set
indexes: companyId, departments.departmentId

Should I consider it a bug? Or maybe I'm missing something?

duplicates

SERVER-23408 Use a smarter batch size for small samples

Closed

is related to

SERVER-22402 Add dependency tracking to the $match stage

Closed

Assignee:: Charlie Swanson
Reporter:: S?awomir Nowak
Participants:: Asya Kamsky, Charlie Swanson, S?awomir Nowak
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Jul 26 2017 06:32:18 PM UTC
Updated:: Aug 15 2017 12:22:12 PM UTC
Resolved:: Aug 01 2017 04:38:33 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates