[SERVER-28066] Sharding on GridFS files, all the files ends up on the same shard Created: 21/Feb/17  Updated: 31/May/17  Resolved: 23/Mar/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.10
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Stephane Marquis Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Participants:

 Description   

Hi !

We have a collection that contains files (and a lot of them) that have sharding enabled on it on the following indexes:

fs.chunks :

{file_id : 1, n: 1}

The files in it are ~15.27mb and when I run fs.chunks.getShardDistribution() I'm getting :

Shard shard0000 at server1:27018
data : 11.84GiB docs : 48997 chunks : 97634
estimated data per chunk : 127KiB
estimated docs per chunk : 0

Shard shard0001 at server2:27018
data : 12.51GiB docs : 51791 chunks : 97633
estimated data per chunk : 134KiB
estimated docs per chunk : 0

Shard shard0002 at server3:27018
data : 5647.85GiB docs : 23407469 chunks : 97633
estimated data per chunk : 59.23MiB
estimated docs per chunk : 239

Totals
data : 5672.22GiB docs : 23508257 chunks : 292900
Shard shard0000 contains 0.2% data, 0.2% docs in cluster, avg obj size on shard : 253KiB
Shard shard0001 contains 0.22% data, 0.22% docs in cluster, avg obj size on shard : 253KiB
Shard shard0002 contains 99.57% data, 99.57% docs in cluster, avg obj size on shard : 253KiB

We're starting to run out of space on the server that is hosting shad0002 and can't figure out why the shard aren't balancing out :S

In the log we're seeing error like :
2017-02-21T18:50:39.563+0000 W SHARDING [conn1338] could not autosplit collection telemetry-fs.fs.chunks :: caused by :: 13333 can't split a chunk in that many parts

and googling it doesn't give me lot of information

Is there anything we're missing there ?



 Comments   
Comment by Kelsey Schubert [ 08/Mar/17 ]

Hi smarquis,

The log error you've shared indicates that the maximum split points in a chunk, 8192 has been reached. This isn't a limit to the number of chunks you can end up with, just the number of pieces you can split one chunk into at a time.

To work around this, you can increase the chunk size from the default 64MB for that collection to something higher, say 128MB or 256 MB. This will reduce the number of pieces that a particular chunk needs to be split into. At a later stage you can then lower the chunk size back to the default 64MB, so that you don't end up with very large chunks.

Would you please follow these steps, and let us know if it resolves the issue?

Thank you,
Thomas

Generated at Thu Feb 08 04:17:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.