[SERVER-28427] GlobalLock with timeout can still block indefinitely Created: 22/Mar/17 Updated: 27/Aug/18 Resolved: 01/May/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 3.4.2, 3.5.4 |
| Fix Version/s: | 3.4.5, 3.5.7 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Judah Schvimer | Assignee: | Geert Bosch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v3.4
|
||||||||
| Sprint: | Storage 2017-04-17, Storage 2017-05-08 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
There is a potential dead lock between the step down command and the noop writer. The step down command takes the global exclusive lock in S mode and then blocks on destroying the noop writer. The noop writer takes the global exclusive lock in IX mode when it does writes. The destructor calls join which won't return until the noop writer finishes its write. To fix this we can: |
| Comments |
| Comment by Githook User [ 06/May/17 ] |
|
Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}Message: (cherry picked from commit 498df9ab853bb03514b8803b9b1f6c2b6900b533) Conflicts: |
| Comment by Githook User [ 01/May/17 ] |
|
Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}Message: |
| Comment by Andy Schwerin [ 22/Mar/17 ] |
|
I don't believe that the bug here is in the noop writer and stepdown. Rather, the problem is that the noop writer asks to wait up to 1ms to acquire the IX lock, but LockerImpl::lockGlobalBegin is not told that timeout, and its call to waitForTicket does not receive a timeout, either. The offending wait is here. Prior to |