[SERVER-19768] Failed applyOps command does not create an oplog entry even with some successful writes Created: 05/Aug/15 Updated: 06/Dec/22 Resolved: 05/Apr/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.1.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Kamran K. | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Duplicate | Votes: | 3 |
| Labels: | 32qa, RF, fuzzer-blacklist, idempotency | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Repl 2016-11-21 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
If an applyOps command is run with some valid ops and some invalid ops, the valid ones can be actually applied, but then the applyOps fails and no oplog entries are generated for the writes performed as part of the valid ops. The manifestation of the bug is a result of the change from Server output:
Repro script:
|
| Comments |
| Comment by Spencer Brody (Inactive) [ 18/Nov/16 ] | ||
|
This issue only affects the applyOps command, which is internal-only and not designed for general use. Internal users of this command use it correctly with only valid ops, so there's no need to fix this. | ||
| Comment by Spencer Brody (Inactive) [ 29/Mar/16 ] | ||
|
adq, while the issue you encountered is probably not the same as this one as this one only applies to user-issued applyOps commands, it's possible that there exists another bug in this space as we've had a few other users report similar issues. I filed | ||
| Comment by Andrew de Quincey [ 01/Feb/16 ] | ||
|
ok, we will retest once the other 3.2 bug we're waiting on has been fixed. ta! | ||
| Comment by Eric Milkie [ 29/Jan/16 ] | ||
|
Since this ticket is concerning the behavior of the applyOps command, I'm afraid that it is unrelated to your issue. | ||
| Comment by Andrew de Quincey [ 29/Jan/16 ] | ||
|
Eric, we used a normal update command from pymongo and ran into "Fatal Assertion 16360" on one of our secondary nodes when it replicated to it. | ||
| Comment by Eric Milkie [ 29/Jan/16 ] | ||
|
Hi Alan. | ||
| Comment by Alan Jackson [ 28/Jan/16 ] | ||
|
Hi. | ||
| Comment by Scott Hernandez (Inactive) [ 05/Aug/15 ] | ||
|
I forgot that an "update" in applyOps implicitly defaults to {upsert:true} , unless overridden – which leads to the update inserting a doc, as you noticed, in your example. BTW. ApplyOps is supposed to copy the whole command to the oplog, more or less, independent of if it fails in the middle. | ||
| Comment by Scott Hernandez (Inactive) [ 05/Aug/15 ] | ||
|
Yeah, if the applyOps isn't being written to the oplog since it doesn't complete successfully that would lead to work being done but not replicated, like the collection creation. | ||
| Comment by Kamran K. [ 05/Aug/15 ] | ||
|
Sorry for the confusion. I hadn't noticed that the update op actually creates a collection and inserts a document on the primary despite not writing any entries to the oplog. The insertion of the empty document with t.insert() then causes the collection to have two documents and for the oplog to have these entries:
The insert then fails to apply on the secondary, leading to the fassert. | ||
| Comment by Scott Hernandez (Inactive) [ 05/Aug/15 ] | ||
|
I think what you are describing is that the update, which is a no-op, doesn't add an oplog entry to create the collection (which is now required), and/or the update fails since the collection doesn't exist, but the collection is implicitly created and isn't rollbacked, in the storage layer, since the update doesn't actually write anything. And I think the bug is the latter, not the former. |