[SERVER-14557] querying with hashed borders broke in 2.6 Created: 15/Jul/14  Updated: 10/Dec/14  Resolved: 15/Jul/14

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: 2.6.0, 2.6.1, 2.6.2, 2.6.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Breitlauch Assignee: Unassigned
Resolution: Done Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

1. Create sharded collection with shardkey as hashed _id.
2. Fill with whatever
3. Lookup one chunks borders
4. Query like this for all documents in that chunk:

db.collection.find(
{ "_id" :

{ "$gte" : -9219144072535768301, "$lt" : -9214747938866076750}

}).hint(

{ "_id" : "hashed"}

)

Behavior in 2.4: Query to one shard and use of index there.
Behavior in 2.6: Very slow table scan on all shards

Participants:

 Description   

It was possible in 2.4 to query a sharded collection by hashedkey with hashed borders like this:

db.collection.find(
{ "_id" :

{ "$gte" : -9219144072535768301, "$lt" : -9214747938866076750}

}).hint(

{ "_id" : "hashed"}

)

The range values are chunk borders.
This query returns all documents from this chunk only asking one shard.
Moving to MongoDB 2.6 this stopped working.

An explain returns BtreeCursor but scannedObjects is indication a whole table scan on all shards.
Additionally it shows wrong indexBounds:

"_id" : [
[

{ "$minElement" : 1 }

,

{ "$maxElement" : 1 }

]
]

The same problem arises in mongoDB's own hadoop connector when using hashed sharded collections. (https://github.com/mongodb/mongo-hadoop)



 Comments   
Comment by J Rassi [ 16/Jul/14 ]

That is a known issue. See SERVER-14400.

Comment by Daniel Breitlauch [ 16/Jul/14 ]

Thanks a lot!
It works.

Although while testing I encountered some strange behavior.
I queried with the chunk borders like this:
db.test.find()
.min(

{ _id : -6148914691236517204}

)
.max(

{ _id : -3074457345618258602}

)
.hint({_id : "hashed"})
.explain()

The explain shows correctly all documents fetched from one shard but:
"n" : 9,
"nscanned" : 12,
"nscannedAllPlans" : 12,
"nscannedObjects" : 9,
"nscannedObjectsAllPlans" : 9,
"numQueries" : 3,
"numShards" : 3,

shows that all 3 shards got queried. The 2 other shards without documents do have:
"nscannedObjects" : 0,
"nscanned" : 1,

Why does the switch issues queries to all shards?

Comment by Greg Studer [ 15/Jul/14 ]

This was broken behavior in v2.4 that unfortunately happened to work in this case - query operators apply to the document data itself, not the index keys.

In order to do chunk range queries over shard keys (which are themselves index keys), you'll want to use the .min()/.max() cursor methods with .hint(). These methods accept the (potentially hashed) index keys, not document values, and will do what you want (including doing the right thing for compound shard keys, which won't work with $gt/$lt).

http://docs.mongodb.org/manual/reference/method/cursor.min/#cursor.min

Generated at Thu Feb 08 03:35:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.