[SERVER-27000] mongos running shardCollection distlock acquisition attempt timed out after 20 seconds due to config primary stepdowns in the stepdown suite Created: 11/Nov/16 Updated: 06/Dec/22 Resolved: 04/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Description |
|
shardCollection runs ReplSetDistLockManager::lockWithSessionID on the mongos with a 20 second time limit. This function does a lot of network calls, so it is far more liable to time out if run remotely with network calls to the config servers. |
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 04/Oct/17 ] |
|
This issue was resolved by moving the shardCollection command to the config server in |
| Comment by Dianna Hohensee (Inactive) [ 11/Jan/17 ] |
|
The problem is that attempting to take the distlock on the mongos is taking too long, times out and fails. The DistLockManager is running on the mongos and may need to do several networks calls. The solution is to move shardCollection to the config server, where the 'distlock' can be taken very quickly and reliably without network calls. Technically, we could move the DistLockManager logic to run on the config, rather than the mongos, and that would also help a little, but not as much. Plus, we want to get rid of distributed locks, not rework them. |