[SERVER-11370] applyOps should not allow yields Created: 25/Oct/13 Updated: 11/Jul/16 Resolved: 04/Feb/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency, Replication, Sharding |
| Affects Version/s: | 2.5.3 |
| Fix Version/s: | 2.6.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | J Rassi | Assignee: | J Rassi |
| Resolution: | Done | Votes: | 0 |
| Labels: | 26qa | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
[ed: original title "Interrupting applyOps can lead to out-of-sync replicas"] The applyOps command acquires a global lock, applies the provided operations one-by-one, and (if run on a primary member) then logs the command invocation to the oplog. However, it is possible for the command to be interrupted after only some of the operations have been applied, in which case the changes already processed are persisted to disk but not reflected in the oplog, and thus not reflected in the data copies of the other members. The command can be interrupted, for example, while processing a delete operation: applyOperation_inlock() calls deleteObjects() with god=false, which, even on a simple _id query, can force a yield if the associated index entry is determined to not be resident in physical memory. See the following shell session. After processing the first operation in applyOps, a normal shutdown was initiated with SIGINT, followed by a restart. Note the change in the collection contents, but the absence of an oplog entry.
Stack trace for above, at interruption:
|
| Comments |
| Comment by Githook User [ 04/Feb/14 ] |
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: |
| Comment by J Rassi [ 29/Oct/13 ] |
|
Creating separate tickets for above two items. New name for this ticket: applyOps should not allow yielding after first write. |
| Comment by J Rassi [ 25/Oct/13 ] |
|
Summarizing discussion with schwerin / eliot:
|
| Comment by J Rassi [ 25/Oct/13 ] |
|
Using checkForInterrupt(true) isn't a fix for the underlying issue. The issue is that a yield was allowed at all. In order for the applyOps contract to be satisfied, no group commit can happen between the first write and the logOp("c"). Thus, it is an error for applyOps to ever release the lock after the first write. To illustrate: I can still repro the issue with the same applyOps+SIGINT scenario above even if the interrupt check in staticYield is changed to checkForInterrupt(true). The execution path changes in the following way: instead of applyOps releasing the lock at the interrupt point, applyOps instead releases the lock moments later during the actual dbtemprelease(). In either case: the newly-released lock is immediately handed off to signalProcessingThread which, upon acquisition, assumes it is safe to group commit and exit the process. So, the way I see it, it should always be safe for a yield point to be an interrupt point, since we allow group commits at yield points. |
| Comment by Eliot Horowitz (Inactive) [ 25/Oct/13 ] |
|
Ok, so the real bug is that staticYield should not be called with false. It has some fallout, in update and delete, after each op, need to reset that variable after the actually isolation. (i.e. after logOp) |
| Comment by J Rassi [ 25/Oct/13 ] |
|
staticYield() calls killCurrentOp.checkForInterrupt(false), which specifically ignores that note. In the repro, you have to get likelyInPhysicalMemory() to return false to force the yield – I think the simplest way to do that is with a fail point. |
| Comment by Eliot Horowitz (Inactive) [ 25/Oct/13 ] |
|
Can you double check this? When a write happens, a note is made on the Client object, which will block all kill attempts until that op is finished. I verified a few ways and can't see what you're seeing. |