[SERVER-27005] Write error revalidate logic needs to wait for lastVisibleOpTime to be committed Created: 11/Nov/16 Updated: 18/Dec/16 Resolved: 18/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Nathan Myers |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Sharding 2017-01-02 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 26 | ||||||||||||||||
| Description |
|
A concrete example of this is during applyOps failure: The issue here is that when the applyOps fail for example when the current primary steps down, we don't know how far the write was applied. In the case that the write was replicated, the applyOps will fail when a retry is made because the precondition will fail. However, if we try to inspect the document, we may not see the post-write state because it is not yet in the committed snapshot. A fix was attempted before ( One proposed solution is a hybrid where instead of advancing the global config optime, have the current request either use the last returned visibleOpTime on the readAfterOpTime on the next query or make it wait for replication using getLastError with the returned visibleOpTime. |
| Comments |
| Comment by Randolph Tan [ 18/Dec/16 ] |
|
I see. Closing this ticket then. |
| Comment by Kaloian Manassiev [ 18/Dec/16 ] |
|
PrepareConfigsFailed is an error, which may only be returned from SCCC. Precondition failed returns BadValue. Therefore we can never enter the branch, which calls undoDonateChunk. |
| Comment by Randolph Tan [ 18/Dec/16 ] |
|
kaloian.manassiev I haven't tried to repro, but the scenario you explained is different from my previous comment. I saw a similar situation in a v3.2 build failure for split can also happen for moveChunk. This is the scenario in more detail: 1. applyOps sent to config. |
| Comment by Kaloian Manassiev [ 16/Dec/16 ] |
|
I looked at the way migration commit works in 3.2 and I can confirm that this will not cause a problem for it. Specifically, the commit works like this:
For the split/merge cases like renctan mentioned above, this situation is benign because it does not impact routing or filtering and subsequent metadata operations will perform a full refresh with a most up-to-date optime. Given that this is not a correctness problem (see above) and that there is only one Evergreen build failure report I am going to close this as Won't Fix. |
| Comment by Randolph Tan [ 02/Dec/16 ] |
|
schwerin After re-examining this again, I believe this issue is benign in the split/merge case, but can cause lost writes for moveChunk. This is because in this scenario applyOps will be applied successfully on the config servers, while the shard will undo the increment on the shard version metadata. This is bad because any mongos who have not yet seen the new config update will continue on sending writes to the shard although it officially doesn't own it anymore (and the shard thinks it still owns it). |
| Comment by Randolph Tan [ 11/Nov/16 ] |
|
Correction: in split and merge, it happens that a write to the config.changelog happens afterwards on success, so the opTime will have advanced and the top level waitForWriteConcern will properly wait for w: majority. |
| Comment by Randolph Tan [ 11/Nov/16 ] |
|
Note: v3.4 has a slightly different issue since the applyOps are now initiated from within the config servers and uses local read concern to revalidate. The issue is that since it does not do any write, it may not wait for the { w: majority } to replicate correctly. |
| Comment by Kaloian Manassiev [ 11/Nov/16 ] |
|
Alternatively we can always do a majority write after failure in order to get the most up-to-date committed and visible op times. This is what we do during migration commit in 3.4. |