[SERVER-27534] All writing operations must fail if the term changes Created: 28/Dec/16 Updated: 16/Aug/22 Resolved: 19/Apr/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Write Ops |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.5, 3.7.6 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Mathias Stearn | Assignee: | Justin Seyster |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2017-10-02, Repl 2017-10-23, Repl 2017-11-13, Repl 2017-12-04, Repl 2017-12-18, Query 2018-03-12, Query 2018-03-26, Query 2018-04-09, Query 2018-04-23 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 68 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Consider the following sequence of events during an batch insert of 1000 documents with ordered:true and w:majority writeConcern.
In this case we've lost 500 writes that are w:majority confirmed, and we've written later ops without the earlier ops even with ordered:true. This is caused by a combination of not killing all ops (at least all writing ops) on all replSet stepdown paths, not closing all connections, and always asking "can I currently write to this namespace" rather than "have I always been able to write to this namespace since starting this op". This issue also effects any operations that write multiple oplog entries with a release of the global lock in between, and "no-op" ops that get the last optime after releasing the global lock. A non-exhaustive list:
Potential solutions:
|
| Comments |
| Comment by David Storch [ 20/Mar/19 ] |
|
tess.avitabile, I don't recall if there was a specific reason that we did not backport this to 3.4. It looks like it would theoretically be possible to backport, though I don't recall all of the details. justin.seyster might remember more than me. |
| Comment by Tess Avitabile (Inactive) [ 18/Mar/19 ] |
|
david.storch, justin.seyster, do you know why we did not backport this to 3.4? Did the issue not exist on 3.4? I am wondering because I am considering how far to backport |
| Comment by Githook User [ 08/May/18 ] |
|
Author: {'email': 'justin.seyster@mongodb.com', 'name': 'Justin Seyster', 'username': 'jseyster'}Message: Note that in our master branch, we committed a more ambitious fix for On the 3.6 branch, though, we just fix the case described in the Jira |
| Comment by Githook User [ 18/Apr/18 ] |
|
Author: {'email': 'justin.seyster@mongodb.com', 'name': 'Justin Seyster', 'username': 'jseyster'}Message: This reapplies bc19d43f, which was reverted by ae50776b. It also adds |
| Comment by Justin Seyster [ 14/Apr/18 ] |
|
Reverted, because of a failing test. This should be back in the build early next week. |
| Comment by Githook User [ 14/Apr/18 ] |
|
Author: {'email': 'justin.seyster@mongodb.com', 'name': 'Justin Seyster', 'username': 'jseyster'}Message: Revert " This reverts commit bc19d43fdc4aab85264def96f638128c0ddb8483. |
| Comment by Githook User [ 12/Apr/18 ] |
|
Author: {'email': 'justin.seyster@mongodb.com', 'name': 'Justin Seyster', 'username': 'jseyster'}Message: The description of this SERVER ticket describes a nasty race that can To fix this, we want to check the operation context for interrupts A recent change to locking gives us almost exactly this for free: if a This patch adds that check in. Acquiring a lock now always throws if |
| Comment by Justin Seyster [ 12/Mar/18 ] |
|
geert.bosch I was thinking of the new behavior in It seems reasonable (though certainly not necessary) to expect parallel behavior when acquiring a lock without waiting. I want to check that we've all agreed on the semantics of this operation before I start adding new checkForInterrupt calls (solution 2a in the description). |
| Comment by Geert Bosch [ 11/Mar/18 ] |
|
Normally the expectation is that any operation does only a bounded amount On Fri, Mar 9, 2018 at 6:57 PM, Justin Seyster (JIRA) <jira@mongodb.org> |
| Comment by Justin Seyster [ 09/Mar/18 ] |
|
It looks like louis.williams Is that the originally intended behavior for interruptible DBLocks? |
| Comment by Geert Bosch [ 23/Feb/18 ] |
|
I'm arguing that |
| Comment by Spencer Brody (Inactive) [ 21/Feb/18 ] |
|
geert.bosch, are you suggesting that we go with the 2A approach from the description, rather than the 2B approach as we had planned? Or are you suggesting that after |
| Comment by Geert Bosch [ 13/Feb/18 ] |
|
It probably would be better to add explicit checks for interrupts at a few specific places where we can make sure exceptions are handled correctly. |
| Comment by Andy Schwerin [ 12/Jan/18 ] |
|
On master, this ticket can be resolved after |
| Comment by Eric Milkie [ 12/Jan/18 ] |
| Comment by Andy Schwerin [ 12/Jan/18 ] |
|
milkie, is 2b in the proposed solution on the docket for 3.8? |
| Comment by Spencer Brody (Inactive) [ 21/Dec/17 ] |
|
We'll need to check batch command locking for times we drop the lock outside of query yield system. Any write operation that drops locks will need to checkForInterrupt after re-acquiring their locks. |
| Comment by David Storch [ 30/Oct/17 ] |
|
If I understand correctly, 2a should be fairly straightforward. This work is not planned to be accomplished outside of a fix for this ticket. |
| Comment by Eric Milkie [ 27/Oct/17 ] |
|
I don't believe 2b is planned soon, but I can't really tell because I was actually unable to find a ticket reflecting this work on the storage backlog. Is anyone aware of one? |
| Comment by Spencer Brody (Inactive) [ 27/Oct/17 ] |
|
2a seems like it belongs on the query team, while 2b seems like it belongs to storage. david.storch milkie, can you comment on the relative complexity of 2a and 2b and whether either is currently planned anytime soon? |
| Comment by Spencer Brody (Inactive) [ 27/Oct/17 ] |
|
The first part of solution #2 has been implemented. Now we just need to do either 2a or 2b to be safe from this issue. |
| Comment by Spencer Brody (Inactive) [ 26/Sep/17 ] |
|
This problem may disappear when |
| Comment by Mathias Stearn [ 07/Sep/17 ] |
|
spencer matthew.russotto renctan We just discussed another interesting case that a fix for this should handle. If a single operation does a read followed by a write, we must ensure that there was no rollback between them. It is probably best in general to check for rollback every time the global lock is acquired rather than trying to find a smaller subset of cases. |
| Comment by Spencer Brody (Inactive) [ 17/Apr/17 ] |
|
Note the naive implementation will add an extra mutex acquisition to every command execution. If we want to avoid that we may be able to add a cached copy of the current term to ReplicationCoordinator that has the same concurrency rules as _canAcceptNonLocalWrites, and decorate the opctx with the term at the same place as the command execution code calls canAcceptWritesFor to check that the command is safe to run. |
| Comment by Spencer Brody (Inactive) [ 17/Apr/17 ] |
|
Probably the best solution is to start decorating the operation context at the beginning of command execution with the current term, then checking in logOp and awaitReplication that the term of the optime being written or waited for hasn't changed from the term that was recorded on the opctx. |