[SERVER-25652] Slow chunk migrations when there are large chunk counts. 3.0, 3.2, 3.3.11 Created: 17/Aug/16  Updated: 31/Oct/16  Resolved: 27/Oct/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.8, 3.3.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Akira Kurogane Assignee: Kaloian Manassiev
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File chunksOnWrongTier_v2.js     File createChunks_v3.js    
Issue Links:
Related
is related to SERVER-26770 Sharding balancer moves chunks from s... Closed
is related to SERVER-26791 move/split/mergeChunk commands do a f... Closed
is related to SERVER-26778 Improve the speed of incremental chun... Closed
is related to SERVER-26684 Chunk diffing code roundtrips between... Closed
is related to SERVER-26777 Improve logging around chunk metadata... Closed
Operating System: ALL
Steps To Reproduce:
  • Create two-shard cluster;
  • Stop the balancer;
  • Create an empty sharded collection and split it into 500,000 chunks.
    (100k chunks may be enough to make noticeable differences, but it has been at 500k that it becomes very easy to observe.)
  • Start the balancer to and observe the time taken to move each chunk.
Sprint: Sharding 2016-09-19, Sharding 2016-10-10, Sharding 2016-10-31
Participants:

 Description   

I've been testing the speed of chunk migrations in an all-on-one-server test cluster. Even when the chunks being migrated are empty (i.e. the chunk move takes only ~0.1 secs) the entire cycle run by the balancer takes a lot longer.

version balance round time
3.0.8 ~4.5 secs
v3.3.11-30-gc96009e ~ 9 secs

From someone's else case with v3.2 and different servers / network to my test I heard of a ~6 second cycle. Not sure if that was a replica set config db or the older SCCC-style one.

Can the balancer be changed so that the balance round will do multiple chunks of each collection so long as they finish quickly? E.g. balance round identifies candidate chunks for migrations, and keeps on doing chunk moves for them serially until a, say, 10 sec window completes.

At any rate if data has been completely deleted for a big fraction of chunk ranges before adding a new shard, it would be good if those chunks moves happened a lot more quickly.



 Comments   
Comment by Kaloian Manassiev [ 27/Oct/16 ]

There are multiple aspects of the chunk migration process which contribute to the speed of migration in the case where there are no documents in the chunks and these have been isolated in the related tickets. This ticket will be closed in lieu of these more specific tickets.

Comment by Dianna Hohensee (Inactive) [ 19/Aug/16 ]

Okay, no problem. I'm not sure exactly when we'll start testing, so if you get a chance, great, if we get there first, that's fine too.

Thanks again for the JS you did attach.

Generated at Thu Feb 08 04:09:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.