[SERVER-48467] Handle quiesce mode in mixed version replica sets Created: 28/May/20  Updated: 29/Oct/23  Resolved: 26/Jun/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Task Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Pavithra Vetriselvan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Repl 2020-06-15, Repl 2020-06-29
Participants:

 Description   

Due to the findings in SERVER-46962, a 4.6 node entering quiesce mode in a mixed 4.4/4.6 replica set could delay a 4.4 node finding a valid sync source. Some options to prevent this are to backport the changes in SERVER-46962 to 4.4 or only enable quiesce mode when the featureCompatibilityVersion is 4.6.



 Comments   
Comment by Githook User [ 26/Jun/20 ]

Author:

{'name': 'Pavi Vetriselvan', 'email': 'pvselvan@umich.edu', 'username': 'pvselvan'}

Message: SERVER-48467 Only turn on quiesce mode in fcv 4.6
Branch: master
https://github.com/mongodb/mongo/commit/33a643298e279b266362729e91481f159e0a7a69

Comment by Evin Roesle [ 25/Jun/20 ]

Makes sense to me. More uniform behavior means easier for the user to understand when they should expect a certain behavior so I am also all for that idea

Comment by Pavithra Vetriselvan [ 25/Jun/20 ]

Ah, got it. Thank you for explaining! I would also prefer to ignore the parameter.

Comment by Tess Avitabile (Inactive) [ 25/Jun/20 ]

Yes, this is what I meant by "ignoring" the parameter–we'll just skip quiesce mode.

Similar to the option of banning vs ignoring the timeoutSecs parameter for the shutdown command, there's a question of whether to ban or ignore the shutdownTimeoutMillis parameter for mongos, but in this case "banning" would be requiring this parameter is 0. I prefer to ignore the parameter when FCV < 4.6. Does that make sense?

Comment by Pavithra Vetriselvan [ 24/Jun/20 ]

That makes sense to me! I agree that skipping quiesce mode on mongos and mongod if FCV < 4.6 provides a more uniform behavior. It looks like timeoutSecs and the server parameters will be unused by quiesce mode if we check for FCV before entering quiesce mode on the server. Just double checking that this is what you meant by "ignoring" the parameter.

As you said, they would only be used for the stepdown timeout.

I'm a little confused by what you mean by requiring/not requiring that the shutdownTimeoutMillis server parameters are 0, though.

Comment by Tess Avitabile (Inactive) [ 24/Jun/20 ]

Yes, FCV-gating the feature sounds good to me!

It's not quite the case that we'll block users from using quiesce mode if they have FCV < 4.6, since quiesce mode happens by default. Instead, I would say this the feature is turned off if FCV < 4.6.

There are a few choices I think we need to make:

  • We could consider having mongos still enter quiesce mode if FCV < 4.6, since the problem only affects mongod. But I think it's more straightforward if mongos also skips quiesce mode if FCV < 4.6.
  • We won't ban the timeoutSecs parameter for the shutdown command on mongod if FCV < 4.6, since this parameter is used for the stepdown timeout as well. But we could consider banning the parameter for mongos. However, I think it's more straightforward to just ignore the parameter if FCV < 4.6.
  • Similarly, we won't require that shutdownTimeoutMillisForSignaledShutdown is 0 on mongod if FCV < 4.6, since this parameter is used for the stepdown timeout as well. But we could consider requiring that mongosShutdownTimeoutMillisForSignaledShutdown is 0 if FCV < 4.6. However, again, I think it's more straightforward to ignore the parameter.

evin.roesle, I want to let you know about the above design choices for FCV-gating quiesce mode.

Yes, those sound like the correct places to check FCV. Though there may be nothing to do for the cases of attaching topologyVersion, since inQuiesceMode() will return false.

Comment by Pavithra Vetriselvan [ 23/Jun/20 ]

tess.avitabile Based on our conversation with Cloud, it seems like the simplest solution for everyone would be to FCV gate quiesce mode to 4.6. Atlas doesn't allow mixed version sets and Cluster Manager/Ops Manager can block users from using Quiesce Mode if they have < FCV 4.6.

After a quick run-through of the code we added for this project, the following places seem to be where we should check FCV:

  • Before entering quiesce mode on mongod
  • Before entering quiesce mode on mongos
  • Attaching topologyVersion to shutdown errors on mongod (since topologyVersion shouldn't change on shutdown < 4.6)
  • Attaching topologyVersion to shutdown errors on mongos (since topologyVersion shouldn't change on shutdown < 4.6)

Is there anything else that I'm missing?

Generated at Thu Feb 08 05:17:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.