[SERVER-30217] applyOps doesn't wait for replication on the last op if it's a noop Created: 18/Jul/17 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | former-quick-wins, gm-ack, neweng, writeconcern | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
When a write concern is provided to the applyOps command, we normally wait on the OpTime of whichever operation successfully completed last. This is erroneous, however, if the last operation in the array happens to be a write no-op and thus isn’t assigned an OpTime. Let the second to last operation in the applyOps be write A, the last operation in applyOps be write B. Let B do a no-op write and let the operation that caused B to be a no-op be C. If C has an OpTime ahead of A, then we won’t wait for C to be replicated and it could be rolled back, even though B was acknowledged. To fix this, we should wait for replication of the node’s last applied OpTime if the last write operation was a no-op write. |
| Comments |
| Comment by Gregory McKeon (Inactive) [ 19/Jun/18 ] | |||||||||||||
|
If we fix any applyOps correctness bugs, we want to fix this one. | |||||||||||||
| Comment by Chibuikem Amaechi [ 01/Jan/18 ] | |||||||||||||
|
Still wrapping my head around this, but if this issue is only related to the non-atomic form of applyOps, which I suspect is _applyOps() in src/mongo/db/repl/apply_ops.cpp, then I suppose the first step in resolving this issue would be to prevent _applyOps() from ignoring no-op write operations by removing the following fragment of code:
I would then proceed cautiously by adding the following block to the lambda expression passed to writeConflictRetry():
I believe the first line of code in the above block would suppress replication for non-atomic operations until the last successfully completed operation in the array. In other words, it would wait for replication of the last op, even if it's a no-op write. Not sure if any of this even makes sense, but this is as far as I've gotten Please share your thoughts! | |||||||||||||
| Comment by Spencer Brody (Inactive) [ 14/Dec/17 ] | |||||||||||||
|
This only applies to the non-atomic form of applyOps |