[SERVER-22502] Replication Protocol 1 rollbacks are more likely during priority takeover Created: 07/Feb/16 Updated: 21/Sep/16 Resolved: 21/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Yoni Douek | Assignee: | Eric Milkie |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | RF | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
General note: I know that the title is too general, but this is the 3rd bug I'm opening this week. We have another one coming for 3.2.1 related to sharding which we will soon publish. We are thinking of moving out of mongodb, the reliability of 3.2 is horrible! 2 bugs in this ticket: 1. We removed a member using rs.remove(). After that - the removed member (of which the log is attached) - started a versioning mess and killed itself. 2. 2nd time we got the following behavior: a member selects itself, although it doesn't need to, and causes a rollback of the other member.
and after 9 seconds, suddenly:
All members in protocol version 1. They were 0 but upgraded according to your docs ~a week ago. |
| Comments |
| Comment by Andy Schwerin [ 12/Apr/16 ] |
|
Re-opening to mark as duplicate of feature request |
| Comment by Eric Milkie [ 18/Feb/16 ] |
|
I am closing this out due to lack of activity, but it can be immediately reopened if there are further questions. |
| Comment by Eric Milkie [ 10/Feb/16 ] |
|
If you're not using w:majority write concern for your writes, it is possible to lose such writes even when they appear to be successful, regardless of MongoDB version or replication protocol version. |
| Comment by Yoni Douek [ 09/Feb/16 ] |
|
Writes to majority is irrelevant in our case (primary, secondary and arbiter), and we don't want to impact our write speeds so that's not an option. Bottom line, we're getting rollbacks and in the previous versions we didn't. We had rollbacks of 7ms, but also of 11sec and even >50sec, are we bound to suffer from data loss just because we upgraded the protocolVersion? |
| Comment by Eric Milkie [ 08/Feb/16 ] |
|
Hi Yoni. Regarding the behavior of a 3-node replica set with one arbiter, I do not see anything in your description that is unexpected for that configuration. There was one update issued to the lower priority primary at about the same moment that the higher priority secondary achieved a successful election for priority takeover; this update was rolled back after the election. This behavior can happen in both protocol versions 0 and 1. However, it may be more likely to happen in protocol version 1 due to the new way that priorities are enforced. At no time, however, are any committed writes rolled back. Writes that are written to a majority of nodes are considered committed and will never roll back; you can issue reads to a view of the data that only contains committed writes by using read concern level majority. |
| Comment by Yoni Douek [ 07/Feb/16 ] |
|
Correction - "but then - it receives only 1 "no" * vote". |