[SERVER-21628] mongos sort query using $in creating inefficient query plan compared to $or Created: 23/Nov/15  Updated: 14/Apr/16  Resolved: 22/Mar/16

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: None
Fix Version/s: 3.3.4

Type: Bug Priority: Major - P3
Reporter: Brian Riley Assignee: David Storch
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Setup mongos with 2 shards and run the following in "test" database.

for (var i = 1; i <= 1000; i++) { 
    db.items.insert({item: "a", i_type: "x", i_id: i, price: i * 50});
    db.items.insert({item: "a", i_type: "y", i_id: i, price: i * 50});
    db.items.insert({item: "b", i_type: "x", i_id: i, price: i * 50});
    db.items.insert({item: "b", i_type: "y", i_id: i, price: i * 50});
}
db.items.createIndex({item: 1, i_type: 1, i_id: 1}, {unique: true})
db.items.createIndex({item: 1, i_type: 1, price: 1})
sh.enableSharding("test")
sh.shardCollection("test.items", {item: 1, i_type: 1})
 
db.items.find(
    {item: "a", i_type: {$in: ["x", "y"]}},
    {_id: 0, item: 1, i_type: 1}
).sort({price: 1}).limit(10).explain("executionStats")
 
db.items.find(
    {$or: [{item: "a", i_type: "x"}, {item: "a", i_type: "y"}]},
    {_id: 0, item: 1, i_type: 1}
).sort({price: 1}).limit(10).explain("executionStats")

Sprint: QuInt E (01/11/16), Query F (02/01/16), Query 10 (02/22/16), Query 12 (04/04/16)
Participants:

 Description   

$in queries are performing an unnecessary/inefficient SORT stage compared to $or using a SORT_MERGE for a logically equivalent query in mongos.

With $in examines 2000 keys.

db.items.find(
    {item: "a", i_type: {$in: ["x", "y"]}},
    {_id: 0, item: 1, i_type: 1, price: 1}
).sort({price: 1}).limit(10).explain("executionStats")

With $or examines 11 keys.

db.items.find(
    {$or: [{item: "a", i_type: "x"}, {item: "a", i_type: "y"}]},
    {_id: 0, item: 1, i_type: 1, price: 1}
).sort({price: 1}).limit(10).explain("executionStats")

Additional queries I've tested:

Without sort examines 10 keys.

db.items.find(
    {item: "a", i_type: {$in: ["x", "y"]}},
    {_id: 0, item: 1, i_type: 1, price: 1}
).limit(10).explain("executionStats")

With only one i_type examines 10 keys.

db.items.find(
    {item: "a", i_type: {$in: ["x"]}},
    {_id: 0, item: 1, i_type: 1, price: 1}
).sort({price: 1}).limit(10).explain("executionStats")

Run directly on replica set examines 11 keys.

db.items.find(
    {item: "a", i_type: {$in: ["x", "y"]}},
    {_id: 0, item: 1, i_type: 1, price: 1}
).sort({price: 1}).limit(10).explain("executionStats")



 Comments   
Comment by Githook User [ 22/Mar/16 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-21628 make plans with the SHARDING_FILTER stage eligible for explodeForSort

This allows certain queries routed through mongos to use a
non-blocking SORT_MERGE plan.
Branch: master
https://github.com/mongodb/mongo/commit/0a58f33863f7660ceeb8523bb6af9df944065e65

Comment by J Rassi [ 23/Nov/15 ]

Thanks for the detailed report. I can confirm that the "explode for sort" logic that is run as part of query plan analysis is getting tripped up by the presence of a ShardingFilterNode (which is added for queries through mongos against sharded collections) in the query solution tree. As a result, many query shapes that normally generate merge sort plans when run through mongod are generating full in-memory sort plans instead when run through mongos. QueryPlannerAnalysis::explodeForSort() and its helpers (structureOKForExplode(), explodeScan()) need to be updated to properly handle ShardingFilterNode.

The rooted $or query happens to be unaffected by this problem, as it doesn't rely on the "explode for sort" logic in order to generate a merge sort plan (planning for rooted $or queries result are special in that they always result in separate index access nodes for each branch, though index access nodes may be combined during the access planner phase of query planning under certain circumstances).

Setting "Needs Triage" for this ticket.

Generated at Thu Feb 08 03:57:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.