[SERVER-47184] replSetReconfig command should check if the node is primary before no-op write Created: 30/Mar/20  Updated: 11/May/20  Resolved: 11/May/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Pavithra Vetriselvan Assignee: Siyuan Zhou
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-47142 Check primary before writing replset ... Closed
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Repl 2020-05-04, Repl 2020-05-18
Participants:

 Description   

The replSetReconfig command does a no-op write, but does not check that the node is still primary before doing so. Since the command only takes a lock when writing down the config document, it is possible for the primary to stepdown and transition to secondary before doing this no-op write.

We end up calling onInternalOpMessage, which will pass in an empty namespace. Because of this, we don't actually do the primary check in _logOpsInner. This would mean that we can allow the reconfig no-op write to occur on a secondary.

This is tracked separately from SERVER-47142 since we would like to backport this change to 4.2 and earlier affected versions.



 Comments   
Comment by Tess Avitabile (Inactive) [ 11/May/20 ]

Thank you, sounds good to me.

Comment by Siyuan Zhou [ 08/May/20 ]

This bug only occurs when a primary steps down after accepting a reconfig but before writing the no-op. That window including writing the config locally is narrow. Since both reconfig and stepdown are rare, their combination is extremely rare.

When this happens, oplog application will complain that the oplog is out of order and fassert. That's how we observed this issue in build failures. Since this isn't reported anywhere even thought it exists in all earlier versions, I tend to close this as Won't Fix. CC tess.avitabile.

On 4.4 and master, this has been fixed as part of Safe Reconfig project in SERVER-47142.

Comment by Pavithra Vetriselvan [ 30/Mar/20 ]

Note that this also exists on 4.0 and 3.6.

Generated at Thu Feb 08 05:13:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.