[SERVER-57494] Race between stepdown and retrying write operation Created: 07/Jun/21  Updated: 29/Oct/23  Resolved: 25/Jun/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.0-rc5, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Samyukta Lanka Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
Problem/Incident
is caused by SERVER-56963 Prevent secondaries from waiting for ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Sprint: Repl 2021-06-28
Participants:
Linked BF Score: 69

 Description   

When using a non-w:1 write concern, waiting for write concern will fail if the primary steps down. If the write is retried before the former primary updates its state, then the same node will attempt to complete the write. In the case of a collection creation or drop, the node will error because the operation already happened (rather than failing with a NotWriteablePrimary error). Our stepdown tests skip retrying again since they think that the write finally succeeded and I could imagine that users/applications would behave similarly. If the new primary skips primary catchup (like our stepdown tests) or primary catchup times out, then it might not have the write that we assumed succeeded.

In our test this can manifest as extra/incorrect/missing documents since the test progresses past the operation even though it did not make it onto the new primary.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 25/Jun/21 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-57494 Clear the locker before re-attempting a command using the same opCtx as we used to refresh sharding state

(cherry picked from commit bffbd983cdb1db991c06d46196f9c008d3980411)
Branch: v5.0
https://github.com/mongodb/mongo/commit/19097bea3851d73a005436f31ecdf771a825a8cb

Comment by Githook User [ 25/Jun/21 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-57494 Clear the locker before re-attempting a command using the same opCtx as we used to refresh sharding state
Branch: master
https://github.com/mongodb/mongo/commit/1af4de09b092dba38cf99c6081a7fecd2b208c38

Comment by Matthew Russotto [ 23/Jun/21 ]

Looks like this may be the problem.

https://github.com/mongodb/mongo/blob/16b8991d8c5e61195b745fcb49e8b94c6f4ec693/src/mongo/db/service_entry_point_mongod.cpp#L140

We assume that recently stepped-down nodes will get an interrupted error. But that's probably not actually true. We went through command processing while we were still primary, but after the kill thread had run.

https://github.com/mongodb/mongo/blob/16b8991d8c5e61195b745fcb49e8b94c6f4ec693/src/mongo/db/command_can_run_here.cpp#L40

(not the use of canAcceptWritesForDatabase_UNSAFE – this is because we don't have any locks. We cannot be assured we will stay primary after this point)

We transitioned to secondary, which allowed us to run the drop code as secondary. Normally this would fail when we attempted to write, but because the collection isn't found, we don't actually try to write. So we don't. We error out with NamespaceNotFound, which is fine. Then we assume we're returning the correct error code and don't bother to check the write concern. It looks like this code was introduced in SERVER-56963

Comment by Matthew Russotto [ 22/Jun/21 ]

We already handle the case with write concern error. In the actual BFs, we don't get a write concern error, which is odd.

Comment by Matthew Russotto [ 21/Jun/21 ]

It looks like what we get in that case is an error message including this:

	"writeConcernError" : {
		"code" : 11602,
		"codeName" : "InterruptedDueToReplStateChange",
		"errmsg" : "operation was interrupted",
		"errInfo" : {
			"writeConcern" : {
				"w" : "majority",
				"wtimeout" : 0,
				"provenance" : "clientSupplied"
			}
		}
	},

which should distinguish between the OK case (NamespaceNotFound with no writeConcernError) and the stepdown case.

Comment by Matthew Russotto [ 21/Jun/21 ]

It appears the issue is that write concern is not respected on failure. That is, if you drop a collection that exists, and that drop fails to be majority committed, then you retry again on the same node, you'll get a namespace error, but you are not assured that your original drop was majority committed. It appears that the driver spec requires us to do retargeting if the topologyVersion of the server has changed, which would prevent this from happening on stepdown (though it would not mitigate against this happening at any other time)

Comment by Samyukta Lanka [ 08/Jun/21 ]

I'm wondering if this retryability issue is actually because SERVER-53431 doesn't apply to the shell/our testing infrastructure.
CC matthew.russotto

Comment by Samyukta Lanka [ 07/Jun/21 ]

max.hirschhorn yes, I think that applies here as well.

Comment by Max Hirschhorn [ 07/Jun/21 ]

samy.lanka, could the reason the test client didn't retry against the new primary be due to SERVER-57167?

Generated at Thu Feb 08 05:41:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.