[SERVER-16396] Replication stall, then one secondary would not shut down (mmapv1) Created: 02/Dec/14 Updated: 21/Jan/15 Resolved: 21/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.8.0-rc1 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Cailin Nelson | Assignee: | Andy Schwerin |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Please see attached graphs showing behavior on our 2.8.0rc1 (mmapv1) replica set. We experienced the following series of events:
Will link to logs for all nodes. |
| Comments |
| Comment by Andy Schwerin [ 21/Jan/15 ] |
|
Duplicate of |
| Comment by Andy Schwerin [ 03/Dec/14 ] |
|
The attached stack trace indicates that the secondary that got stuck shutting down froze up waiting for the bgsync thread to terminate. That thread is waiting on a condition variable, presumably this one, but I'm not 100% certain because the stack trace appears to have been taken from a host that didn't have access to the debugging symbols. Notice that thread 11 is waiting in bgsync, and thread 2 is waiting for replication to shutdown a thread (presumably bgsync). Thread 1 is just sleeping forever because it was not the first thread to call exitCleanly. |