[SERVER-27048] Fix recursive lock issue leading to deadlock or crash in LegacySession Created: 15/Nov/16 Updated: 19/Nov/16 Resolved: 16/Nov/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.0-rc4, 3.5.1 |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | ADAM Martin (Inactive) | Assignee: | ADAM Martin (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | platforms-hocr | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Completed: | |||||
| Steps To Reproduce: | Race conditions are hard to reproduce. |
||||
| Sprint: | Platforms 2016-11-21 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
The `TransportLayerLegacy::endAllSessions` function takes a lock `_sessionsMutex`. This lock is also taken by the `TransportLayerLegacy::_destroy` method, which is called indirectly by the `TransportLayerLegacy::LegacySession::~LegacySession()` destructor. Within `endAllSessions`, the `_sessions` list of weak pointers is one-by-one promoted by taking a shared pointer to it, then processed, then the shared pointer is discarded. This leads to a pair of difficult to reproduce race conditions on `endAllSessions`: 1: The failing case in BF-4102, where the shared pointer is created, making the ref-count of its object at least 2. Other threads dispose of their shared pointers to this object, leaving only the shared pointer which was promoted from the weak pointer behind. That shared pointer will go out of scope at the end of the loop iteration processing it, thus invoking the destructor. That destructor will indirectly call `TransportLayerLegacy::_destroy`, which will attempt to take the lock. Recursively taking a lock in C++'s mutex class is undefined behavior. Typical implementations will either deadlock, or throw `std::system_error` (as encouraged by the standard, but this is non-normative behavior.) Thus, the attempt to take the lock in the `_destroy` function will throw an exception, which is the precise observed behavior in BF-4102. 2. An entry in the weak pointer list has expired, due to the last true pointers to it being destroyed. Promoting the weak pointer will fail, giving a nullptr value for the shared_ptr. In this case, the code skips over empty promoted pointers, thus having no failing actions. |
| Comments |
| Comment by Githook User [ 16/Nov/16 ] |
|
Author: {u'username': u'adamlsd', u'name': u'ADAM David Alan Martin', u'email': u'adam.martin@10gen.com'}Message: The LegacySession teardown code has a race where promoted weak (cherry picked from commit 2fae4242b9e8256da203639895d1ecd3fe8e2794) |
| Comment by Githook User [ 16/Nov/16 ] |
|
Author: {u'username': u'adamlsd', u'name': u'ADAM David Alan Martin', u'email': u'adam.martin@10gen.com'}Message: The LegacySession teardown code has a race where promoted weak |