[SERVER-15097] distributed lock can get stuck at state 1 after bad connections Created: 29/Aug/14 Updated: 26/Sep/17 Resolved: 01/Jul/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.7.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | DistLock | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding 2 04/24/15, Sharding 3 05/15/15, Sharding 4 06/05/15, Sharding 5 06/26/16, Sharding 8 08/28/15, Sharding 9 (09/18/15), Sharding 16 (06/24/16), Sharding 17 (07/15/16) | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
If a thread sets the state to 1* but aborted (with socket exceptions) before it fully claims the lock, it can block all other threads from acquiring the lock as long as it keeps on updating the lock ping. * For this issue to happen, the other threads who were also able to successfully modify the state to 1 should have a lower ts than the thread who aborted. |
| Comments |
| Comment by Randolph Tan [ 01/Jul/16 ] |
|
Issue is exclusive to SCCC which is now gone in v3.4. |
| Comment by Randolph Tan [ 17/Sep/15 ] |
|
Maybe, and it will require additional manual work as the code has undergone some refactoring in the master branch. |
| Comment by Andy Schwerin [ 17/Sep/15 ] |
|
Would this warrant backport to 3.0? |
| Comment by Randolph Tan [ 15/Jul/15 ] |
|
Note that this only affects distributed lock with legacy 3 config servers. |