[SERVER-19610] Query skips duplicate predicate Created: 27/Jul/15  Updated: 19/Aug/15  Resolved: 30/Jul/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Steve Ardis Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Insert a document similar to this and run the query / explains outlined in the "Description".

{
    "_id" : ObjectId("55a9adb08716e441387fbec6"),
    "batch_id" : "1437183404770",
    "create_date" : ISODate("2015-07-18T01:36:48.059Z"),
    "csv_fields" : {
        "phone1" : "1234567890",
        "deleted" : "0"
    },
    "status_records" : [ 
        {
            "status" : "SKIPPED",
            "date" : ISODate("2015-07-18T01:42:59.919Z"),
            "notes" : [ 
                "Marked as inactive", 
                "Duplicate email", 
                "\"clientpk\" not exported"
            ]
        }, 
        {
            "status" : "PENDING_PREPROCESSING",
            "date" : ISODate("2015-07-18T01:36:48.059Z")
        }
    ]
}

Participants:

 Description   

$size criteria is being skipped if an additional criteria with $eq is included.

Given the following query:

db.contacts.find({ "status_records.0.status": "SKIPPED",
                     "status_records.0.notes": { $size: 1 },
                     "status_records.0.notes": { $eq: "\"clientpk\" not exported" } })

$size:1 is being excluded from the query, as can be seen by the generated plan:

{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "test.contacts",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "status_records.0.notes" : {
                                                "$eq" : "\"clientpk\" not exported"
                                        }
                                },
                                {
                                        "status_records.0.status" : {
                                                "$eq" : "SKIPPED"
                                        }
                                }
                        ]
                },
                "winningPlan" : {
                        "stage" : "KEEP_MUTATIONS",
                        "inputStage" : {
                                "stage" : "FETCH",
                                "filter" : {
                                        "status_records.0.notes" : {
                                                "$eq" : "\"clientpk\" not exported"
                                        }
                                },
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "keyPattern" : {
                                                "status_records.0.status" : 1
                                        },
                                        "indexName" : "status_records.0.status_1",
                                        "isMultiKey" : false,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                                "status_records.0.status" : [
                                                        "[\"SKIPPED\", \"SKIPPED\"]"
                                                ]
                                        }
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "localhost.localdomain",
                "port" : 27017,
                "version" : "3.0.3",
                "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105"
        },
        "ok" : 1
}

If I comment out the $eq criteria, there are no results (as would be expected) - here is it's plan:

{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "test.contacts",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "status_records.0.notes" : {
                                                "$size" : 1
                                        }
                                },
                                {
                                        "status_records.0.status" : {
                                                "$eq" : "SKIPPED"
                                        }
                                }
                        ]
                },
                "winningPlan" : {
                        "stage" : "KEEP_MUTATIONS",
                        "inputStage" : {
                                "stage" : "FETCH",
                                "filter" : {
                                        "status_records.0.notes" : {
                                                "$size" : 1
                                        }
                                },
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "keyPattern" : {
                                                "status_records.0.status" : 1
                                        },
                                        "indexName" : "status_records.0.status_1",
                                        "isMultiKey" : false,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                                "status_records.0.status" : [
                                                        "[\"SKIPPED\", \"SKIPPED\"]"
                                                ]
                                        }
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "localhost.localdomain",
                "port" : 27017,
                "version" : "3.0.3",
                "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105"
        },
        "ok" : 1
}



 Comments   
Comment by Ramon Fernandez Marina [ 19/Aug/15 ]

Thanks for posting a solution based on $and scardis, this is also works – and it's simpler than using aggregation

Cheers,
Ramón.

Comment by Steve Ardis [ 17/Aug/15 ]

Thank you for your response.

After giving some thought to the fact that the query is actually a JSON document, your response makes complete sense. However, I believe using the "$and" operator is the correct solution (as opposed to using the aggregate function workaround).

For example:

db.contacts.find(
{
$and: [ 
    {
        "status_records.0.notes": { $size: 1 }
    },
    {
        "status_records.0.notes": { $eq: "\"clientpk\" not exported" }
    },
    {
        "status_records.0.status": "SKIPPED"
    }
]
})

Comment by Ramon Fernandez Marina [ 30/Jul/15 ]

scardis, a colleague points out that this is expected behavior, as your query predicate contains a duplicate field (status_records.0.notes) that gets removed by the shell:

> q = { "status_records.0.status": "SKIPPED", "status_records.0.notes": { $size: 1 }, "status_records.0.notes": { $eq: "\"clientpk\" not exported" } }
{
        "status_records.0.status" : "SKIPPED",
        "status_records.0.notes" : {
                "$eq" : "\"clientpk\" not exported"
        }
}

The $size predicate is indeed gone. To illustrate further, let's swap the second and third predicates:

> q = { "status_records.0.status": "SKIPPED", "status_records.0.notes": { $eq: "\"clientpk\" not exported" }, "status_records.0.notes": { $size: 1 } }
{
        "status_records.0.status" : "SKIPPED",
        "status_records.0.notes" : {
                "$size" : 1
        }
}

As you can see the last predicate on a duplicate field is the one that prevails and the only one sent to the server. You may want to consider altering your schema, although if you choose to preserve it a simple alternative is to use aggregation:

> db.foo.aggregate([{$match : { $and : [ {"status_records.0.status": "SKIPPED"}, {"status_records.0.notes": { $size: 1 }}, {"status_records.0.notes": { $eq: "\"clientpk\" not exported"}} ] } } ] )
> db.foo.aggregate([{$match : { $and : [ {"status_records.0.status": "SKIPPED"}, {"status_records.0.notes": { $size: 3 }}, {"status_records.0.notes": { $eq: "\"clientpk\" not exported"}} ] } } ] ).pretty()
{
        "_id" : ObjectId("55a9adb08716e441387fbec6"),
        "batch_id" : "1437183404770",
        "create_date" : ISODate("2015-07-18T01:36:48.059Z"),
        "csv_fields" : {
                "phone1" : "1234567890",
                "deleted" : "0"
        },
        "status_records" : [
                {
                        "status" : "SKIPPED",
                        "date" : ISODate("2015-07-18T01:42:59.919Z"),
                        "notes" : [
                                "Marked as inactive",
                                "Duplicate email",
                                "\"clientpk\" not exported"
                        ]
                },
                {
                        "status" : "PENDING_PREPROCESSING",
                        "date" : ISODate("2015-07-18T01:36:48.059Z")
                }
        ]
}

Regards,
Ramón.

Comment by Ramon Fernandez Marina [ 30/Jul/15 ]

Thanks for your report scardis. We're able to reproduce the behavior and we're investigating.

Generated at Thu Feb 08 03:51:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.