[SERVER-29861] Insert should report number of docs inserted or writeConcernError on error code 11602 (InterruptedDueToReplStateChange) Created: 26/Jun/17 Updated: 27/Oct/23 Resolved: 17/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Write Ops |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Daniel Nichter | Assignee: | Asya Kamsky |
| Resolution: | Works as Designed | Votes: | 6 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Participants: | |||||||||
| Description |
|
According to the docs and my understanding of relevant things (e.g. write concerns), I take the following to be true: Insert should always return the number of documents inserted ("n") that satisfied the write concern. (Where "always" implies that the original MongoDB server is still alive and reachable. If the server or network or etc. dies, then of course there's nothing left to report "n".) This should hold true especially when the repl set state changes because this is a common and useful event. Apps should expect the primary to go away, but they should also be guaranteed that MongoDB will cleanly terminate pending ops wrt the write concern (total server/network/etc. failure notwithstanding). If the aforementioned is not true, then perhaps there's no bug and https://docs.mongodb.com/manual/reference/command/insert/ just needs to be updated to reflect the real guarantees and return values. But presuming it's true, I put it to the test and found a problem: insert sometimes does not return the number of docs inserted (or a writeConcernError) on error code 11602 (InterruptedDueToReplStateChange). Setup:
Experiment:
Expect:
Results: Results vary. First of all, there are 2 possible errors in the client (mgo): an EOF on the network socket (because Mongo kills connections on rs state change), or error code 11602 (InterruptedDueToReplStateChange). On EOF, there can't be any return write result (i.e. the documented output) because the connection is closed. I'm not sure if this a problem in mgo (not doing a final read?) or Mongo (killing the connection before sending a final write result?) Either way, the problem arises on occasion: mgo will report N docs written, but N + some random number of docs were actually written. The 2nd error condition is my focus. Mongo seems to gracefully kill the write op, but it does not always return the documented output. First, an example of when it works:
That's a bsondump of the raw Go []byte (I hacked this into my local copy of mgo). That is what I'd expect every time; it matches the documented output (plus some other, undocumented fields). But sometimes the output is only:
And in this particular case, mgo reports 14,000 docs written, but:
The critical question: is MongoDB returning a write result with n=320? If yes, then it's a bug in mgo. I don't think this is the case because I'm writing the raw []byte from deep inside mgo before the data is returned up the call stack. Inside the mgo code here https://github.com/go-mgo/mgo/blob/v2-unstable/socket.go#L625, I added:
Then I bsondump the last file written, which is the doc shown above. Furthermore, this works most of the time, but sometimes it doesn't, which leads me to believe mgo is not missing a final read on the network connection; rather, Mongo is not sending a write result with n: 320 in the example above. If that's true, this is a show-stopping problem because it means an app cannot know, even with a write concern, how many docs were actually inserted. Moreover, the app might presume a wrong number (because it trusts the driver which doesn't know either), as in this case: 14,000 reported vs. 14,320 actual. — Happy to provide more details, tweak my experiment, whatever. Time allowing, I'll automate the whole experiment so I can run it 1,000+ times to measure accurately the failure rate. Right now, doing it by hand, the rough estimate is 10-40% failure rate. |
| Comments |
| Comment by Ian Whalen (Inactive) [ 26/Apr/19 ] | |||||||||||||||||||||||
|
Switching "Drivers Changes Needed" from "Maybe" to "Not Needed" since this was closed as something other than Fixed. | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 17/Jan/18 ] | |||||||||||||||||||||||
|
Documentation improvements will be tracked in | |||||||||||||||||||||||
| Comment by Daniel Nichter [ 06/Jan/18 ] | |||||||||||||||||||||||
|
Thanks for the in-depth discussion and exploration of this issue. You've convinced me: it's not a bug. "BulkWriteResult.nInserted I take that at face value. My misunderstanding would have been adverted if the docs said what you said. Then I would know: on write, I'll get >= n docs definitely written (wrt write concern). Then the question becomes how, from the app dev point of view, do I handle the ">" case? That depends on the app, docs, indexes, etc. Wrt the inconsistent return value on error (error code 11602): consistency is great, but inconsistency is not unusual. MySQL can report 1 of 3 different errors when the server goes away. A really well-written app will handle all three. I think with MongoDB, devs will wind up doing the same: learning all possible "lost connection/server went away" errors and handling all of them. | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 18/Dec/17 ] | |||||||||||||||||||||||
|
It was pointed out to me that the official MongoDB CRUD specification document directs that on a write error the fields that should be returned are ones describing the error, but not "n" indicating how many documents were actually written, which only gets returned when there has not been an error (and an acknowledgement is requested). This suggests that mgo driver should probably not be reporting "n" at all in the case of a write concern error. Linked | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 11/Dec/17 ] | |||||||||||||||||||||||
In a distributed system it is not possible to guarantee that "n" is correct when "n" depends on hearing from other parts of the system. Imagine a scenario with two nodes, where a write of a document is sent to one but must be acknowledged only when it has replicated to the second node in the system. If the primary successfully writes the document but does not receive the acknowledgement from the second node, should it acknowledge to the client that one document was written, or zero? One was written locally, but zero satisfied the "written to both nodes" request (that it knows of). From point of view of the primary, how many documents were written on the second node? It has no way of knowing. Since it cannot know that the document was written on the secondary, it must consider zero as the only possible correct response. But it's possible that the document was successfully written on the second node, even though knowledge of that fact never reached the primary.
is consistent with the behavior you observe - only the number of documents that definitely were written and replicated (i.e. satisfied write concern) is returned, but more documents may have been written that are not included in "n" because there is no proof that they were replicated before the connection was lost. Note that analogous to the two node scenario is the communication between the server and the driver - if the driver does not receive acknowledgement from the server that a document was written it must not return to the application that the document was written, but nevertheless the document may have actually been written successfully. This behavior is not a bug. There are many changes we are working on to improve the situation for the application - for example, the automatic re-try of document writes by the driver improves the situation for the application since it doesn't need to know how to re-try the failed write in a way that's idempotent. (Not all drivers already support this new server feature, but that work is tracked in
It wasn't clear from the ticket whether there was potentially a server or a driver bug here. It's probably best to investigate that in a separate ticket. | |||||||||||||||||||||||
| Comment by Daniel Nichter [ 08/Dec/17 ] | |||||||||||||||||||||||
|
The current behavior is not a bug? Without knowing the underlying code, I presume the intention is for "n" to be correct. However, if "n" was not intended to be correct, i.e. to be only an estimate or best effort count, then I could see how it's not a bug. Also, the other factor is that error code 11602 is sometimes sent, and sometimes not. For this, I'd argue the same: if that's the original code intention, then it's not a bug (although very strange behavior for a database). But if the code intention is to always return this, then sometimes not sending it is a bug. | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 08/Dec/17 ] | |||||||||||||||||||||||
|
> I think the real issue requires more than doc updates. Of course, that's why this is still open, but as an improvement request. One workaround is to use individual document operations rather than batch operations. Some upcoming features may also help, but they would have to be handled correctly in the driver as well as the server. | |||||||||||||||||||||||
| Comment by Daniel Nichter [ 07/Dec/17 ] | |||||||||||||||||||||||
|
Hello! Thanks for the reply and looking into this! I think the real issue requires more than doc updates. Imho, the goal is this: MongoDB should always report the exact number of documents inserted, satisfying the write concern, under normal operation. Imho again, anything less than that makes the database virtually unusable because if an application cannot know, with certainty, how many docs were inserted, app development becomes very difficult. A database is often the app's source of true; it must not lie to the app wrt the data it has or hasn't stored. In the test data we see, for example, that Mongo will report 66,000 docs inserted when in reality 66,448 docs were inserted. Depending on what the app and data are used for, that could be a huge and important difference. If we accept the current behavior (doc updates notwithstanding), apps are forced not to trust the database, and they're forced to have complicated logic to verify and reconcile what the database reports ("n") vs. reality. Moreover, because data is always in flight, it's not logical to presume the app can successfully reconcile the data short of taking the database offline. Re documentation for "the server to provide more information about the number of documents successfully written when interrupted in the middle of a batch". There's two things here. Re "middle of a batch": imho, this in an implementation detail that should not matter to the user/app. The app does not and should not care how MongoDB reliably stores a bunch of docs wrt the write concern. Second, the "number of docs success written" is already documented: it's "n". Well, the docs say "The number of documents inserted." which a user presumes to mean "successfully written wrt the write concern under normal ops". Again, the user/app doesn't care about the details (under normal operation); they should simply be able to trust that if the database says N docs written, then exactly N docs have been written. I think the salient question to agree or disagree on is: Should "n", wrt write concern under normal ops, always be the exact number of docs written? If "no", then doc updates are in order, but (imho) such behavior seriously undermines the usability of the database because apps won't be able to trust it. If "yes", then fixing this bug will be a huge win, imho. | |||||||||||||||||||||||
| Comment by Asya Kamsky [ 23/Nov/17 ] | |||||||||||||||||||||||
|
The behavior you observe is due to the fact that the writeConcern is only checked at the end of every "batch". This is why in the driver you see report based on multiples of 1,000 (which is the size of batches larger requests are split into). Your suggestion that documentation is implying that
suggests that documentation improvements are in order to make it more clear in what scenario the number of documents successfully acknowledged will be returned. Since this is expected behavior, I will convert this issue to an enhancement request for the server to provide more information about the number of documents successfully written when interrupted in the middle of a batch. | |||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 03/Aug/17 ] | |||||||||||||||||||||||
|
spencer any thoughts from the repl team on the severity/priority of this? | |||||||||||||||||||||||
| Comment by Daniel Nichter [ 18/Jul/17 ] | |||||||||||||||||||||||
|
Excellent, thanks Mark! This will be a big win for app dev and ops. | |||||||||||||||||||||||
| Comment by Mark Agarunov [ 17/Jul/17 ] | |||||||||||||||||||||||
|
Hello dnichter, Thank you for the report and my apologies for the delay in response. I've been able to reproduce the behavior you've described using the detailed examples and code provided so I've placed this ticket in "Need Triage" to be scheduled against our currently planned work. Thanks, | |||||||||||||||||||||||
| Comment by Daniel Nichter [ 04/Jul/17 ] | |||||||||||||||||||||||
|
Re attached files: test1-data.csv, test1.log, test2-data.csv, test2.log TL;DR
Details
Two experiment runs with 100 tests each (1 test = the full cycle: insert, stepDown, recover, remove, count, print data). Data FieldsThe .csv files are the useful data, with fields:
Where
The line number of the data file = the test number (important to match results with log). ResultsSo the pass/fail rate is easy to calculate from the data:
100 tests and 62 pass=true = 62% pass, 38% fail. For test2 the results are 54% pass, 46% fail. Error code 11602I haven't checked every single "pass" but it seems pass=true always when Code:11602 is reported and pass=false when not reported. Example: in test1, test 8, the code is reported:
And data for 8 and 9:
Very similar test runs: ~66k docs and stepDown at ~0.7s, but 8 gets the error code and 9 does not. So far, this pattern holds for all the tests I've checked. ConclusionRoot cause is unknown, but the test data proves that MongoDB is inconsistent wrt reporting the number of docs inserted (or a writeConcernError) when the replica state changes. It also proves that MongoDB doesn't report all the docs inserted (e.g. 66k vs. 66,448 above). Presuming the write concern guarantees the writes (i.e. docs aren't lost during a routine repl set state change), the next must-have for production apps is reliably knowing how many docs were inserted (for an ordered insert with a write concern). Without knowing this, apps cannot easily, gracefully recover from routine repl set state changes. Lastly, only insert has been tested. Update, delete, and all other write ops should be tested, too. | |||||||||||||||||||||||
| Comment by Daniel Nichter [ 26/Jun/17 ] | |||||||||||||||||||||||
|
Description is correct now. Thanks Mark. | |||||||||||||||||||||||
| Comment by Daniel Nichter [ 26/Jun/17 ] | |||||||||||||||||||||||
|
Copy-paste error duplicated the description. It repeats at "According to the docs". I can't edit it. Please delete description from start to second occurrence of "According to the docs ". |