[SERVER-26124] onDrainComplete should be called when a node is definitely going to become primary Created: 15/Sep/16 Updated: 05/Apr/17 Resolved: 16/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Sprint: | Repl 2016-09-19 |
| Participants: |
| Description |
|
Because other subsystems, e.g. sharding, assume that onDrainComplete is called when a node is becoming primary, onDrainComplete should be called after setting _canAcceptNonLocalWrites. |
| Comments |
| Comment by Siyuan Zhou [ 16/Sep/16 ] |
|
Closing this ticket as per Kal's comment. |
| Comment by Kaloian Manassiev [ 15/Sep/16 ] |
|
This is fine, but it must remain as a callback before leaving drain mode, which can be used for work, which needs to acquire locks or block waiting on threads, which may be holding locks. |
| Comment by Siyuan Zhou [ 15/Sep/16 ] |
|
If a node steps down during drain mode, there's still a chance of calling onDrainComplete in secondary mode. That's actually the reason of this check. So onDrainComplete doesn't always mean the node is entering primary mode. Whenever _canAcceptNonLocalWrites is set, that's the moment the node is ready to serve as a primary. |
| Comment by Kaloian Manassiev [ 15/Sep/16 ] |
|
The purpose of onDrainComplete is that it is called when the node is entering the primary mode, but is not yet under the global X lock. Nothing in the code in there currently requires or assumes that non-local writes are allowed. If we move it to be called after setting _canAcceptNonLocalWrites we risk that it either is called under the global X lock or that there is a race condition where the node could step down while onDrainComplete is running. |