[SERVER-31586] Recreate PeriodicBalancerSettingsRefresher on the ServiceContext Created: 16/Oct/17 Updated: 30/Oct/23 Resolved: 25/Jul/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Matthew Saltz (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Sharding 2018-06-04, Sharding 2018-06-18, Sharding 2018-07-02, Sharding 2018-07-16, Sharding 2018-07-30 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Originally created by |
| Comments |
| Comment by Githook User [ 25/Jul/18 ] |
|
Author: {'name': 'Matthew Saltz', 'email': 'matthew.saltz@mongodb.com'}Message: |
| Comment by Githook User [ 09/Jul/18 ] |
|
Author: {'name': 'Matthew Saltz', 'email': 'matthew.saltz@mongodb.com'}Message: |
| Comment by Matthew Saltz (Inactive) [ 07/Jun/18 ] |
|
Talked with Randolph online, and we agreed on:
This will fix test issues, allow for periodic refreshes, handle step-up and step-down, and handle shutdown and restart. |
| Comment by Matthew Saltz (Inactive) [ 07/Jun/18 ] |
|
"during addShard, also store the balancer config in the new shard" So you'd send the balancer configuration inside the document that addShard sends? If that's the suggestion, that would actually fix test issues, and after that I think it would be sufficient to use the onStepUp and onStepDown functions in the ChunkSplitter to trigger start and stop of a periodic refresher, where the first refresh would happen synchronously. |
| Comment by Randolph Tan [ 07/Jun/18 ] |
|
What if we treat the balancer config like other sharding metadata (chunk/coll/db)? We persist the info in storage and update when it when we detect a mismatch. So the idea will be:
This has the benefit of working also on server restarts. |
| Comment by Matthew Saltz (Inactive) [ 07/Jun/18 ] |
|
esha.maharishi had suggested potentially doing the refresh on shard version mismatch, but I think that wouldn't work until a query for a sharded collection came in. However, we could consider throwing a StaleConfigException (or some new error ShardInitalizationInProgress) on inserts while some flag is set that indicates whether the shard is finished 'initializing' after addShard. Then we'd catch the error and then do the refresh. This seems like it could be useful in general potentially - to have some place not under a lock to finish shard initialization that's allowed to make blocking calls - but it might be a bit extreme to do just for this small change we're making. |
| Comment by Matthew Saltz (Inactive) [ 06/Jun/18 ] |
|
There are two reasons something like this ticket needs to be done:
There are two things that need to be considered:
For the purposes of tests, all we need is for the first refresh to happen when the shard is added to the cluster with the addShard command. My first thought was to piggy-back on the addShard path in the shard, but no blocking/networking calls can be made in ShardingState::initializeFromShardIdentity because it's under a lock. So my second thought was this code review, which uses ChunkSplitter::setReplicaSetMode (which is called in initializeFromShardIdentity) to launch a thread to do the refresh, which will also occur whenever step-up occurs. The problem with this is that it's asynchronous and can fail tests non-deterministically and also cause undesirable behavior for customers if for whatever reason it stalls while writes are happening. This can be addressed by doing the same thing, by essentially having the ChunkSplitter not do any splits until this refresh is done. This would not, however, block inserts, which means it wouldn't necessarily fix our tests (though it would fix issues where a customer increases the chunk size and would not want splits to keep happening until the configuration is refreshed). Another option would be to have a blocking test-only command refreshBalancerConfigNow that would be used after adding shards in ShardingTest, to fix the testing issues, and then do the previous idea for setReplicaSetMode (which is currently only called on addShard but could be made to happen on onStepUp as well, though there's no reason onStepUp couldn't just block and do the refresh). I think my favorite idea is a test-only command that blocks and does a refresh, to fix tests. Then we'd have ChunkSplitter start a thread on the first call to setReplicaSetMode to do periodic refreshes, and then stop that thread on onStepDown, and restart it if onStepUp is ever called. For the very first time it starts, we'd ignore requests for splits as described above until some flag like "balancerConfigIsInitialized" is set. |