[SERVER-47142] Check primary before writing replset config and no-op Created: 26/Mar/20  Updated: 29/Oct/23  Resolved: 17/Apr/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.4.0-rc3, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Pavithra Vetriselvan Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-47205 Stopping dropping snapshots after saf... Closed
Duplicate
is duplicated by SERVER-46516 Majority write concern is blocked by ... Closed
Related
related to SERVER-47184 replSetReconfig command should check ... Closed
related to SERVER-47369 doReplSetReconfig should fail during ... Closed
related to SERVER-47973 Address TODOs in SERVER-47142 Closed
is related to SERVER-47206 Majority commit point is not set back... Backlog
is related to SERVER-46516 Majority write concern is blocked by ... Closed
is related to SERVER-47636 Force reconfig running concurrently w... Closed
is related to SERVER-47205 Stopping dropping snapshots after saf... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Repl 2020-04-06, Repl 2020-04-20
Participants:
Linked BF Score: 42

 Description   

There are currently two problems.

1) We do not check if we are still primary before writing down a new config document locally. Consider the following scenario:

  • Node1 receives a reconfig command
  • Node1 begins stepping down because it hears of a new term
  • Node1 starts killing both writes (and some system ops) that hold the global lock in X, IX, or S mode and reads that encounter prepare conflicts. The replSetReconfig command does not fall into either category.
  • Node1 finishes killing ops and steps down, transitioning to secondary
  • Node1 writes down the new config document, which takes the DB lock in X mode but will not be killed since we already finished stepping down

Node1's config will continue to get propagated via heartbeats even though it already stepped down.

2) The replSetReconfig command does a no-op write, but does not check that the node is still primary before doing so (Similar example, readConcern: linearizable)

We end up calling onInternalOpMessage, which will pass in an empty namespace. Because of this, we don't actually do the primary check in _logOpsInner. This would mean that we can allow the reconfig no-op write to occur on a secondary.

Since these two things should happen together to avoid any inconsistent states, we should consider refactoring the code so we can do the primary check once.



 Comments   
Comment by Githook User [ 14/May/20 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-47973 Address TODOs in SERVER-47142
Branch: master
https://github.com/mongodb/mongo/commit/5f5f2e90089c9f95e2cf06394609778a00cc12a8

Comment by Githook User [ 24/Apr/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-47142 Don't allow non-force reconfig command in drain mode.

(cherry picked from commit 31a60aab7a3d6b93407e0447b056c33f31a15991)
Branch: v4.4
https://github.com/mongodb/mongo/commit/bdac8f4eda8af0e602f739407b5d05ef43331778

Comment by Githook User [ 24/Apr/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-47142 Check primary before writing config and no-op.

(cherry picked from commit 544cbb209709ebee4f17f2d669b1909bf66be6bb)
Branch: v4.4
https://github.com/mongodb/mongo/commit/cf6abf2091c36130c3b6feef6af45fee2761a922

Comment by Githook User [ 17/Apr/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-47142 Don't allow non-force reconfig command in drain mode.
Branch: master
https://github.com/mongodb/mongo/commit/31a60aab7a3d6b93407e0447b056c33f31a15991

Comment by Githook User [ 17/Apr/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-47142 Check primary before writing config and no-op.
Branch: master
https://github.com/mongodb/mongo/commit/544cbb209709ebee4f17f2d669b1909bf66be6bb

Comment by Githook User [ 17/Apr/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-47142 Revert "SERVER-45094 check if node is primary before doing reconfig noop write"

This reverts commit 0e7b476357a12802f1be1ca5c8a4b1a919000ef8.
Branch: master
https://github.com/mongodb/mongo/commit/0f8cb172af4f5b5df89e96097398319c811657b6

Comment by Siyuan Zhou [ 09/Apr/20 ]

william.schultz pointed out:

From a quick test, it seems we still may need to address a related issue with
'awaitConfigCommitment' being called during drain mode, which should be possible
since we now call 'awaitConfigCommitment' at the beginning of the reconfig
command. If the node is in drain mode, I believe we will encounter this error
condition:
https://github.com/mongodb/mongo/blob/f44ca2e82c0a3aa4e834565e057bed60c588ba5....
It's not an invariant failure but it will return a confusing error message i.e.

2020-04-09T18:40:57.080Z E TEST [main] Throwing
exception{"exception":"Expected ::mongo::Status::OK() == (status) (OK ==
PrimarySteppedDown Last committed optime in the previous config (

Unknown macro: { ts}

) has not yet become committed in the current config with

Unknown macro: {version}

:: caused by :: Term changed from 2147483647 to 1 while waiting for replication,
indicating that this node must have stepped down.) @src/mongo/db/repl/replic
ation_coordinator_impl_reconfig_test.cpp:1008"}

I'll address it in this ticket as well.

Comment by Siyuan Zhou [ 28/Mar/20 ]

The solution for SERVER-46516 was going to be to move the noop write into doReplSetReconfig. Should I close that ticket in favor of this one, if we are going to refactor the noop write?

tess.avitabile, as we discussed today, we may not need to do anything in SERVER-46516 or we could repurpose it for not dropping snapshots except for change of writeConcernMajorityJournalDefault. I think it's fine to have this ticket to cover moving the noop write.

I assumed that we would be backporting both fixes since writing down the config document without doing a primary check also existed before safe reconfig.

Good point. I agree the fixes for both config write and no-op write are beginning-of-time bugs, but the race between reconfig and stepdown is extremely rare in earlier versions, so I'd prefer to keep them as-is and only backport to 4.4. If we have to backport to earlier versions, I'd prefer to add separate checks rather than moving the noop write since that changes the order of installing the config in memory and the no-op write.

I'm happy to take this ticket, but I'd like to discuss it on Monday so I assigned it to backlog.

Comment by Pavithra Vetriselvan [ 27/Mar/20 ]

Yes, I think we can close SERVER-46516 since this ticket should involve moving the noop write.

I assumed that we would be backporting both fixes since writing down the config document without doing a primary check also existed before safe reconfig. Although, I think the OplogOutOfOrder was caused by the noop entry being written on a secondary, so that seems like a more urgent fix.

I am fine filing a separate ticket for the noop primary check so that it can be backported sooner. siyuan.zhou Does that sound reasonable?

Comment by Tess Avitabile (Inactive) [ 27/Mar/20 ]

Not checking for primary before doing the noop write is a beginning-of-time bug, so we will need to backport the fix. That may need to be under a separate ticket if safe reconfig on master requires its own solution.

The solution for SERVER-46516 was going to be to move the noop write into doReplSetReconfig. Should I close that ticket in favor of this one, if we are going to refactor the noop write?

Generated at Thu Feb 08 05:13:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.