[SERVER-10304] Don't hold mutex while trying to establish connection to replica sets Created: 23/Jul/13  Updated: 10/Dec/14  Resolved: 30/Jan/14

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: 2.5.1
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Randolph Tan Assignee: Mathias Stearn
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.patch    
Issue Links:
Depends
depends on SERVER-12284 ReplicaSetMonitor is broken Closed
Related
is related to SERVER-12284 ReplicaSetMonitor is broken Closed
Backwards Compatibility: Fully Compatible
Participants:

 Description   

There are a couple of places inside ReplicaSetMonitor that holds the _setsLock mutex while creating a new connection to the seed nodes of the set. This can be problematic in the case when the monitor decides to stop monitoring a set after getting continuous errors (basically it assumes that the shard has been removed), then another request will try to talk to the removed set. This will then prompt the monitor to recreate it from the cached seedlist. And this is done while holding the mutex. If it takes time for the set to error out, then it will be blocking all the other threads who wants to use the monitor to talk to the other sets as well.



 Comments   
Comment by Randolph Tan [ 26/Jul/13 ]

Attached test patch to demonstrate the problem.

Generated at Thu Feb 08 03:22:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.