[SERVER-23666] ShardRegistry lookup dictionaries should provide consistent erase Created: 12/Apr/16 Updated: 05/Apr/17 Resolved: 17/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Misha Tyulenev | Assignee: | Misha Tyulenev |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Sprint: | Sharding 16 (06/24/16) |
| Participants: |
| Description |
|
Currently shardRegisty contain several dictionaries and it results in the inconsistent update and erase operations. |
| Comments |
| Comment by Misha Tyulenev [ 17/Jun/16 ] | |||||||||||||
|
The issues with connectionStirng 1) and 2) are resolved. The maps reorg is not needed. | |||||||||||||
| Comment by Misha Tyulenev [ 13/Apr/16 ] | |||||||||||||
|
The current design makes it easy to forget to clean up something from the map when shard need to be updated leaving shardRegistry in the inconsistent state. A better design can use RAII to make an update consistent - i.e. erasing a shard should be only allowed by Id and this should clean up host->shard and rsname->shard mapping. | |||||||||||||
| Comment by Spencer Brody (Inactive) [ 13/Apr/16 ] | |||||||||||||
|
#1 and #3 make sense to me. I'm still not sure what you mean by #2 - cleaning up the maps is "inconsistent". | |||||||||||||
| Comment by Misha Tyulenev [ 13/Apr/16 ] | |||||||||||||
|
spencer if you look at how _lookup, _hostLookup, and _rsLookup maps are maintained in https://github.com/mongodb/mongo/blob/master/src/mongo/s/client/shard_registry.cpp#L451 then it will be more clear what I mean.
i.e. it cleans up 2 maps that have dependencies on shardId. Later those maps are repopulated with the new data from the connection string. There are three problems here. solving those requires acquiring locks with more control e.g. using lock ordering techniques. Current getConnString can not cause locking conflicts because its stored in the shard object - but the change which is currently in review changes it to call RSMonitor that requires to take a lock. | |||||||||||||
| Comment by Spencer Brody (Inactive) [ 13/Apr/16 ] | |||||||||||||
|
Also, how does calling getConnString cause locking conflicts? | |||||||||||||
| Comment by Spencer Brody (Inactive) [ 13/Apr/16 ] | |||||||||||||
|
misha.tyulenev, this is news to me - do you have a case of those maps getting out of sync? |