[SERVER-19375] choosing syncsource should compare against last fetched optime rather than last applied Created: 13/Jul/15 Updated: 19/Sep/15 Resolved: 16/Jul/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.4, 3.1.5 |
| Fix Version/s: | 3.0.5, 3.1.6 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matt Dannenberg | Assignee: | Matt Dannenberg |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Backport Completed: | |||||
| Sprint: | RPL 6 07/17/15 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
This was not a problem until recently, as the producer thread would wait for all ops to be applied before looking for a new sync source (last fetched == last applied). However, to resolve a potential deadlock between the producer thread and applier thread, that wait was removed. As a result, a node could have ops in its buffer when it choose a new sync source, apply those ops, and then receive those recently applied ops from its sync source which would cause the node to rollback and fassert. |
| Comments |
| Comment by Igor Canadi [ 19/Aug/15 ] |
|
Could this be related to |
| Comment by Githook User [ 16/Jul/15 ] |
|
Author: {u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: also change chooseNewSyncSource to take a Timestamp rather than an OpTime |
| Comment by Githook User [ 14/Jul/15 ] |
|
Author: {u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: |