[SERVER-66116] Aborted Read with MongoNotPrimaryException Created: 02/May/22 Updated: 03/Oct/22 |
|
| Status: | Blocked |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kyle Kingsbury | Assignee: | Matthew Russotto |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | Grab a Jepsen environment with five nodes and https://github.com/jepsen-io/mongodb at da4a3fcef9298b4658db435991a402afe7497f00, then run (e.g.):
|
||||||||||||||||
| Sprint: | Repl 2022-05-16, Repl 2022-05-30, Repl 2022-06-13, Repl 2022-06-27, Repl 2022-07-11, Repl 2022-08-08, Repl 2022-08-22, Repl 2022-09-05, Repl 2022-09-19, Repl 2022-07-25, Repl 2022-10-03 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
It looks like MongoNotPrimaryException (or whatever the protocol response is that triggers this error in the Java driver) might actually be an indefinite error, rather than a definite failure. Consider this pair of operations from a Jepsen list-append test:
In this case both "transactions" are actually single-document operations. The first operation performs a single findAndModify to $push the number 3 onto a list in document 855; that write threw a MongoNotPrimaryException. The second is a read of document 855, which observed that write of 3. The documentation for MongoNotPrimaryException says that the server "refused to execute... a write operation", which seems fairly plain: the write of 3 must not have happened. Since we go on to read 3, this looks like an aborted read. This problem occurs with MongoDB 4.4.9 and Java driver 4.6.0, write concern majority, read concern snapshot/majority, and is reproducible using network partitions. It also looks like MongoWriteConcernWithResponseException with a message containing "InterruptedDueToReplStateChange" may also do the same thing, but I'm less sure whether this error should be interpreted as definite or not. |
| Comments |
| Comment by Judah Schvimer [ 27/Jun/22 ] |
|
The next step on this ticket is to define the drivers spec changes needed to address this issue, based on the error label added in |
| Comment by Cristopher Stauffer [ 17/May/22 ] |
|
Linking |
| Comment by Cristopher Stauffer [ 13/May/22 ] |
|
aphyr@jepsen.io, thank you for reporting this issue. We were able to reproduce the issue using your steps. For the scenario you outlined on MongoDB 4.4.9 and Java Driver 4.6.0, we are in fact not providing the correct error with regards to it being definite or indefinite. Additionally, we were able to see that in earlier versions of the Java Driver the behavior expected by the Jepsen tests did occur. We are going to be scheduling an update to our driver specification to return an indefinite error in any cases where an indefinite error could occur including the list-append scenario you provided. We will link this ticket to the associated Driver work: DRIVERS-2327. We actively test with Jepsen as part of our regression testing, and we will be reviewing our test matrix to capture this combination in the future. |
| Comment by Matthew Russotto [ 06/May/22 ] |
|
We are currently continuing to actively investigate this issue. |
| Comment by Eric Sedor [ 02/May/22 ] |
|
Unfortunately that's right about editing descriptions; I've made that edit and we'll take a look at this. Thanks, Kyle! |
| Comment by Kyle Kingsbury [ 02/May/22 ] |
|
Argh, is there really no edit button for Jira issues? "single-document reads" should be "single-document operations". |