[SERVER-27000] mongos running shardCollection distlock acquisition attempt timed out after 20 seconds due to config primary stepdowns in the stepdown suite Created: 11/Nov/16  Updated: 06/Dec/22  Resolved: 04/Oct/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-29107 move shardCollection logic into new _... Closed
Assigned Teams:
Sharding
Backwards Compatibility: Fully Compatible
Participants:
Linked BF Score: 0

 Description   

shardCollection runs ReplSetDistLockManager::lockWithSessionID on the mongos with a 20 second time limit. This function does a lot of network calls, so it is far more liable to time out if run remotely with network calls to the config servers.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 04/Oct/17 ]

This issue was resolved by moving the shardCollection command to the config server in SERVER-29107. Closing.

Comment by Dianna Hohensee (Inactive) [ 11/Jan/17 ]

The problem is that attempting to take the distlock on the mongos is taking too long, times out and fails. The DistLockManager is running on the mongos and may need to do several networks calls.

The solution is to move shardCollection to the config server, where the 'distlock' can be taken very quickly and reliably without network calls.

Technically, we could move the DistLockManager logic to run on the config, rather than the mongos, and that would also help a little, but not as much. Plus, we want to get rid of distributed locks, not rework them.

Generated at Thu Feb 08 04:13:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.