[SERVER-24968] Make sure we don't retry non-idempotent writes on "NotMaster" errors that actually did a write Created: 09/Jul/16  Updated: 26/Aug/16  Resolved: 25/Aug/16

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Siyuan Zhou
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-25126 Return a different error code if the ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 18 (08/05/16), Repl 2016-08-29
Participants:

 Description   

Generally if you get a NotMaster error back from an attempted write you know that nothing was written. With SERVER-24574, however, it's possible for a node that steps down mid-write to actually return NotMaster to the client. When we write to the config servers, we retry non-idempotent writes on NotMaster errors because we assume no write actually happened, but that assumption is now invalid, so the retries can lead applying a non-idempotent write multiple times.



 Comments   
Comment by Siyuan Zhou [ 25/Aug/16 ]

Resolve as a dup to SERVER-25126. All occurrences of NotMaster have been audited there.

Comment by Siyuan Zhou [ 19/Jul/16 ]

schwerin, yes, that's the plan. I'll audit all uses of NotMaster. Updated that ticket's description.

Comment by Andy Schwerin [ 18/Jul/16 ]

siyuan.zhou, when you implement SERVER-25126, will that eliminate all cases where NotMaster gets returned even though a write happened?

Comment by Andy Schwerin [ 11/Jul/16 ]

We need to return a different error code if the step down occurs after vs before the wtite. NotMaster should always mean "no write occurred." I think there may be another ticket for this. Check with siyuan.zhou .

Comment by Spencer Brody (Inactive) [ 09/Jul/16 ]

Can this also cause us to return NotMaster to end users when a write actually happened, because the connection from the mongos to the shard will be marked internal and not be closed on stepdown?

Generated at Thu Feb 08 04:07:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.