[SERVER-37279] suboptimal read isolation when using $in with shard keys Created: 24/Sep/18  Updated: 03/Oct/18  Resolved: 03/Oct/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Richard Smith Assignee: Nick Brewer
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File both.log     File fill_data.py     File just_rs1.log     File just_rs2.log     File setup.log     File shard_distribution.log    
Issue Links:
Duplicate
duplicates SERVER-1007 rewrite $in queries on shard key to o... Backlog
Participants:

 Description   

Say I have a collection with a hashed shard key on field "S".

If I send a query such as: db.coll.find({"S": {$in: [...]}}) with[...] being a relative lengthy list (couple of hundred entries). It seems like all entries in there are send to all shards instead of mongos splitting the query appropriately. As a result, each shard has to check for each of the values.



 Comments   
Comment by Nick Brewer [ 03/Oct/18 ]

rmsmith Thanks for the thorough clarification and steps to reproduce. The behavior you're seeing is in fact expected - we currently have a ticket open to track improvements that would allow $in queries to split more efficiently: SERVER-1007

You can vote for that ticket, and follow along with it for updates. For now I'll close this ticket as a duplicate - feel free to let us know if you have any questions.

-Nick

Comment by Richard Smith [ 02/Oct/18 ]

I uploaded a couple of files.

  • there is fill_data.py, which I used to fill a demo collection with data which reproduces the issue.
  • there is setup.log, which shows show I sharded the collection afterwards
  • there is shard_distribution.log which shows the output of getShardDistribution for the collection in question
  • there are just_rs1.log and just_rs2.log with execution stats for db.coll.find({"xy": {$in: ["1:1"]}}) and db.coll.find({"xy": {$in: ["999:999"]}}), respectively. The execution stats indicate that each of the queries hits just one rs.
  • there is both.log with the execution stats for db.coll.find({"xy": {$in: ["1:1", "999:999"]}}). the stats indicate that both rs search for both values ("1:1" and "999:999") even though the query router should be aware that sending "1:1" to rs2 and "999:999" to rs1 is unnecessary.
Comment by Nick Brewer [ 24/Sep/18 ]

rmsmith With a hashed shard key, it is expected that documents that are "close" based on the key value will not be on the same shard. The behavior of checking with each shard most likely indicates that your hashed key is working correctly, as the hashed field is distributed across the shards in your cluster. If you're seeing something in your logs that leads you to believe otherwise, it would be useful if you provide the relevant log lines here.

I believe what you're seeing here is by design; however if you'd like us to take a closer look, please upload:

As this ticket is public, you can upload this information to our secure portal if you'd prefer - information shared there is only available to MongoDB employees, and is automatically removed after a period of time.

Thanks,
-Nick

Generated at Thu Feb 08 04:45:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.