[SERVER-15972] Replication state sometimes transitions from PRIMARY to SECONDARY while writes are in progress Created: 22/Oct/14 Updated: 19/Nov/14 Resolved: 07/Nov/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.7.8 |
| Fix Version/s: | 2.8.0-rc0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Ian Whalen (Inactive) | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
When the topology coordinator chooses to step down in response to a heartbeat, it changes its own state immediately. The replication coordinator then schedules a callback to change its own state under the exclusive lock. However, it is possible that before this callback executes, another callback runs that updates the replication executor's state, but without the exclusive lock. This is an error. A reasonable solution would be for the topology coordinator to enter a "going to step down" state, but defer actual stepdown until the replication coordinator can be updated synchronously. Original description of symptoms follows. this fassert appears to have happened before: TASK HISTORY (load older entries to see previous failures)
|
| Comments |
| Comment by Githook User [ 07/Nov/14 ] |
|
Author: {u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}Message: |
| Comment by Scott Hernandez (Inactive) [ 05/Nov/14 ] |
|
This seems to be either insert releasing the lock during which a stepdown (from heartbeat) happens or the stepdown not holding the lock while transitioning (out of primary). I am investigating which one. |
| Comment by Ian Whalen (Inactive) [ 03/Nov/14 ] |
|
problem seems to be ongoing as of 11/3. |
| Comment by Daniel Pasette (Inactive) [ 28/Oct/14 ] |
|
crystal, please assign to repl team |