[SERVER-24630] Mongos erroneously advances config optime for writes that fail write concern Created: 16/Jun/16 Updated: 25/Jan/17 Resolved: 21/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.9, 3.3.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Completed: | |||||||||||||
| Sprint: | Repl 16 (06/24/16) | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Mongos advances config optime using the last visible optime from config servers. If the write commands fail due to timeout or NotMaster, the last visible optime is ahead of the last committed and may be rolled back. With the following patch, majority read fails due to timeout, which should never happen.
This is found during implementing |
| Comments |
| Comment by Githook User [ 29/Jul/16 ] |
|
Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'visualzhou@gmail.com'}Message: Use getLastOpCommitted rather than getLastOpVisible in sharding. |
| Comment by Eric Milkie [ 22/Jun/16 ] |
|
Yes, backport of this ticket should commence. |
| Comment by Siyuan Zhou [ 21/Jun/16 ] |
|
milkie, this is a dependency of primary catch-up, do we want to backport this ticket and tagging internal connections? |
| Comment by Githook User [ 21/Jun/16 ] |
|
Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}Message: Use getLastOpCommitted rather than getLastOpVisible in sharding. |
| Comment by Eric Milkie [ 17/Jun/16 ] |
|
After discussion with kaloian.manassiev we think we should return a lastVisibleOpTime of 0 if the write concern fails. This should be true even when returning NotMaster as an error at the beginning of command processing (in addition to post-write write concern processing). This is option 1 as described above. Using lastOpCommitted for this would be incorrect. |
| Comment by Eric Milkie [ 17/Jun/16 ] |
|
I think we should do (3). But only for writes, and only for replication-related errors (NotMaster, Write Concern Timeout). |
| Comment by Siyuan Zhou [ 16/Jun/16 ] |
|
In the discussion with spencer and scotthernandez, we came up with 4 possible solutions. I'd vote for the last option as its design is simplest and follows the current behavior. For successful writes, the lastVisibleOpTime equals to lastOpCommitted and may be ahead of snapshots anyway. |