[SERVER-32657] Sharding GridFS has write bottleneck Created: 11/Jan/18  Updated: 05/Feb/18  Resolved: 11/Jan/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Roben Assignee: Mark Agarunov
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-10220 Support hashed fields in compound ind... Closed
Operating System: ALL
Participants:

 Description   

As https://docs.mongodb.com/manual/core/gridfs/#sharding-gridfs
says chunks sharding key should be files_id but it's an objectid and changes monotonically.

As https://docs.mongodb.com/manual/reference/limits/#Monotonically-Increasing-Shard-Keys-Can-Limit-Insert-Throughput
says
```
For clusters with high insert volumes, a shard keys with monotonically increasing and decreasing keys can affect insert throughput. If your shard key is the _id field, be aware that the default values of the _id fields are ObjectIds which have generally increasing values.
```
So the choice of files_id will lead to the write of gridfs chunks always happens on a single shard.
It's really a big problem because if someone use gridfs it always means there will be a lot of files data to store and need sharding.



 Comments   
Comment by Mark Agarunov [ 11/Jan/18 ]

Hello narychen,

Thank you for the report. Looking over this, it appears to be a request for the same behavior as detailed in SERVER-10220 to enable sharding on a hash of the file_id instead of the file_id itself so I've closed this ticket as a duplicate. Please follow SERVER-10220 for updates on this issue.

Thanks,
Mark

Generated at Thu Feb 08 04:30:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.