[SERVER-57494] Race between stepdown and retrying write operation Created: 07/Jun/21 Updated: 29/Oct/23 Resolved: 25/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.0-rc5, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Samyukta Lanka | Assignee: | Matthew Russotto |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v5.0
|
||||||||||||||||||||
| Sprint: | Repl 2021-06-28 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 69 | ||||||||||||||||||||
| Description |
|
When using a non-w:1 write concern, waiting for write concern will fail if the primary steps down. If the write is retried before the former primary updates its state, then the same node will attempt to complete the write. In the case of a collection creation or drop, the node will error because the operation already happened (rather than failing with a NotWriteablePrimary error). Our stepdown tests skip retrying again since they think that the write finally succeeded and I could imagine that users/applications would behave similarly. If the new primary skips primary catchup (like our stepdown tests) or primary catchup times out, then it might not have the write that we assumed succeeded. In our test this can manifest as extra/incorrect/missing documents since the test progresses past the operation even though it did not make it onto the new primary. |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] | ||||||||||||
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! | ||||||||||||
| Comment by Githook User [ 25/Jun/21 ] | ||||||||||||
|
Author: {'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}Message: (cherry picked from commit bffbd983cdb1db991c06d46196f9c008d3980411) | ||||||||||||
| Comment by Githook User [ 25/Jun/21 ] | ||||||||||||
|
Author: {'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}Message: | ||||||||||||
| Comment by Matthew Russotto [ 23/Jun/21 ] | ||||||||||||
|
Looks like this may be the problem. We assume that recently stepped-down nodes will get an interrupted error. But that's probably not actually true. We went through command processing while we were still primary, but after the kill thread had run. (not the use of canAcceptWritesForDatabase_UNSAFE – this is because we don't have any locks. We cannot be assured we will stay primary after this point) We transitioned to secondary, which allowed us to run the drop code as secondary. Normally this would fail when we attempted to write, but because the collection isn't found, we don't actually try to write. So we don't. We error out with NamespaceNotFound, which is fine. Then we assume we're returning the correct error code and don't bother to check the write concern. It looks like this code was introduced in | ||||||||||||
| Comment by Matthew Russotto [ 22/Jun/21 ] | ||||||||||||
|
We already handle the case with write concern error. In the actual BFs, we don't get a write concern error, which is odd. | ||||||||||||
| Comment by Matthew Russotto [ 21/Jun/21 ] | ||||||||||||
|
It looks like what we get in that case is an error message including this:
which should distinguish between the OK case (NamespaceNotFound with no writeConcernError) and the stepdown case. | ||||||||||||
| Comment by Matthew Russotto [ 21/Jun/21 ] | ||||||||||||
|
It appears the issue is that write concern is not respected on failure. That is, if you drop a collection that exists, and that drop fails to be majority committed, then you retry again on the same node, you'll get a namespace error, but you are not assured that your original drop was majority committed. It appears that the driver spec requires us to do retargeting if the topologyVersion of the server has changed, which would prevent this from happening on stepdown (though it would not mitigate against this happening at any other time) | ||||||||||||
| Comment by Samyukta Lanka [ 08/Jun/21 ] | ||||||||||||
|
I'm wondering if this retryability issue is actually because | ||||||||||||
| Comment by Samyukta Lanka [ 07/Jun/21 ] | ||||||||||||
|
max.hirschhorn yes, I think that applies here as well. | ||||||||||||
| Comment by Max Hirschhorn [ 07/Jun/21 ] | ||||||||||||
|
samy.lanka, could the reason the test client didn't retry against the new primary be due to |