[SERVER-14394] Create initial chunks directly on shards Created: 30/Jun/14  Updated: 24/Jul/20  Resolved: 31/Jul/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.10
Fix Version/s: 4.0.2, 4.1.1

Type: Improvement Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Cheahuychou Mao
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-13340 parallel creation of hashed shard key... Closed
is duplicated by SERVER-31425 Retrying initial shardCollection with... Closed
is duplicated by SERVER-35915 Blacklist hash_skey_split.js from the... Closed
is duplicated by SERVER-10430 Improve distribution of new chunks wh... Closed
Problem/Incident
Related
related to SERVER-17474 Mongos doesn't see new sharded collec... Closed
is related to SERVER-14298 Add support to create/define chunks i... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0
Sprint: Sharding 2018-07-16, Sharding 2018-07-30, Sharding 2018-08-13
Participants:
Case:
Linked BF Score: 26

 Description   

It would be useful to be able to create initial hashed shard key chunks directly on shards, instead of creating them on one shard and then distributing.

Specifically, when the balancer is busy doing a long balancer run and it won't be able to balance out those empty hashed key chunks right way.



 Comments   
Comment by Githook User [ 21/Aug/18 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-14394 Create initial hashed shard key chunks directly on shards

(cherry picked from commit d83b73ea2db96ccbcf5f2a0710f360f88896ab9c)
Branch: v4.0
https://github.com/mongodb/mongo/commit/6e85d023feb6e87ba476108f4f6f149e4bd2449d

Comment by Gregory McKeon (Inactive) [ 21/Aug/18 ]

janna.golden this seems to be partially committed. Can you take a look and finish it off?

Comment by Githook User [ 17/Aug/18 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-14394 Move the intial split code to a separate module and add unit tests

(cherry picked from commit 4dd46fb7bdc6d3ef62888b01f585d6fed54a081f)
Branch: v4.0
https://github.com/mongodb/mongo/commit/9fc1c55c9648c9c9a49ddbcd7914cf2d56a9bcd4

Comment by Githook User [ 16/Aug/18 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-14394 Add UUID checking to the hashed sharding with initial split test

(cherry picked from commit fc372cdf9a070eecaf600b75649fd2690c3d927d)
Branch: v4.0
https://github.com/mongodb/mongo/commit/d94a1c5debcbb0d2e23f38c9552bb1c281978bb9

Comment by Githook User [ 31/Jul/18 ]

Author:

{'username': 'cheahuychou', 'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com'}

Message: SERVER-14394 Create initial hashed shard key chunks directly on shards
Branch: master
https://github.com/mongodb/mongo/commit/d83b73ea2db96ccbcf5f2a0710f360f88896ab9c

Comment by Githook User [ 29/Jul/18 ]

Author:

{'name': 'Sara Golemon', 'email': 'sara.golemon@mongodb.com', 'username': 'sgolemon'}

Message: SERVER-14394 Fix initial_split_policy_test on macOS
Branch: master
https://github.com/mongodb/mongo/commit/afca554216dfa32cbf92374a828451cdb1e04b8a

Comment by Githook User [ 27/Jul/18 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-14394 Move the intial split code to a separate module and add unit tests
Branch: master
https://github.com/mongodb/mongo/commit/4dd46fb7bdc6d3ef62888b01f585d6fed54a081f

Comment by Githook User [ 27/Jul/18 ]

Author:

{'username': 'cheahuychou', 'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com'}

Message: Revert "SERVER-14394 Move the intial split code to a separate module and add unit tests"

This reverts commit 37c95d3ee26b99817fdda6fdfc0b3f867e04aa84.
Branch: master
https://github.com/mongodb/mongo/commit/4dc7dc996cb05bc11f4c474899dfb32810baa8bb

Comment by Githook User [ 26/Jul/18 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-14394 Move the intial split code to a separate module and add unit tests
Branch: master
https://github.com/mongodb/mongo/commit/37c95d3ee26b99817fdda6fdfc0b3f867e04aa84

Comment by Githook User [ 23/Jul/18 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-14394 Add UUID checking to the hashed sharding with initial split test
Branch: master
https://github.com/mongodb/mongo/commit/fc372cdf9a070eecaf600b75649fd2690c3d927d

Comment by Kaloian Manassiev [ 14/Dec/17 ]

Moving to the "unshardec collections" Epic, because I think this would make it simpler to directly write the initial chunks on the config server instead of using the moveChunk machinery.

Comment by sam flint [ 02/Jul/14 ]

I believe my use case is before any information is actually written to the collection. I am pre creating collections 6 days ahead of time. I understand the race condition if records were being inserted. I would assume you can detect insertion at creation and decide how to proceed?

I am looking forward to this. Thanks

Comment by Greg Studer [ 30/Jun/14 ]

> Why is this necessary? Could they not be distributed at creation?
No, the chunks are created by some mongos (call it mongos A). If another mongos (mongos B) or mongoses are actively inserting into the collection in question on the primary shard while the collection is getting sharded, there's a race where the mongos B inserts may make it to the shard before mongos A tells the shard that it no longer owns the whole collection (this is the race migrations are designed to avoid). It's okay so long as the new chunks created are all on the primary shard to begin with.

The change is scheduled, note the fixVersion, but it's unfortunately not simple.

Comment by sam flint [ 30/Jun/14 ]

I am not sure because i haven't looked at the code, but I know at the time of sharding a collection mongos knows that it will create 2* shards as chunks... so 4 shards 8 chunks. But they are created on the primary shard then moved? Why is this necessary? Could they not be distributed at creation? I ran into this issue because the balancer was doing a round so I have 8 chunks on the primary shard and it would have never balanced them in time for the application to start writing to the collection.

I would like these to be separate threads for the balancer. If the balancer is busy doing a balancing round chunks should still be evenly distributed in a new hashed collection that is created.

In the current case I would have to stop the balancer(which is not easy) to create the collection and have 8 chunks be distributed into 2,2,2,2.

Comment by Greg Studer [ 30/Jun/14 ]

This requires moving the shardCollection command to the primary database shard - it's unsafe to do this from mongos as the shardCollection command works now.

Comment by Thomas Rueckstiess [ 30/Jun/14 ]

FYI: A similar request (maybe more general) was made in SERVER-14298. If the new collection is empty (no data needs to be migrated), it would be good if the chunk distribution can be generated in a way that the chunks live directly on the target shards, rather than using the moveChunk mechanism to move them one by one.

I'm keeping both tickets open. This one is to track the feature request to create the chunk distribution directly on the target shards for empty hashed sharded collections. SERVER-14298 tracks in a more general way a user-facing feature that allows to specify a custom chunk distribution manually for empty collections".

Comment by sam flint [ 30/Jun/14 ]

This is critical if you are adding in a new collection and need it to be distributed immediately.

Thanks

Generated at Thu Feb 08 03:34:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.