[SERVER-42243] Ban the combination of PV0 and enableMajorityReadConcern Created: 15/Jul/19  Updated: 06/Dec/22  Resolved: 02/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.4.23
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-27124 Disallow readConcern:majority reads o... Closed
related to SERVER-35941 Don't maintain full stable optime can... Closed
Assigned Teams:
Replication
Sprint: Repl 2019-08-12, Repl 2019-08-26, Repl 2019-09-09, Repl 2019-09-23, Repl 2019-10-07, Repl 2019-10-21, Repl 2019-11-04, Repl 2019-11-18, Repl 2019-12-02
Participants:

 Description   

enableMajorityReadConcern isn't supported in PV0. We should ban the combination on 3.4 and 3.6 for both reconfig to PV0 and startup with "--enableMajorityReadConcern".



 Comments   
Comment by Siyuan Zhou [ 27/Nov/19 ]

Talked with Eric, I agree milkie's proposal is the minimal change - Make PV0 omit enableMajorityReadConcern silently without any performance penalties. We need to call dropAllSnapshots or _dropAllSnapshots_inlock on transitioning to PV0.

It turned out 3.6 already drops all snapshots on downgrade, but 3.4 doesn't. Since 3.4's end of life is in 2020 January, I'd suggest closing this issue as Won't Fix on 3.4. If we want to fix it, we also need to update the title of this ticket or file a new one.

Comment by Judah Schvimer [ 23/Jul/19 ]

Option (4) is, as this ticket was initially written:

  • return an error for a reconfig that sets PV1 to PV0 when enableMajorityReadConcern is true for that node
  • fail to start up with enableMajorityReadConcern: true when in PV0
  • do nothing and maintain current non-ideal behavior if you receive a reconfig via heartbeat that sets PV1 to PV0 when enableMajorityReadConcern is true

I suspect it isn't common for different nodes to have different enableMajorityReadConcern settings. Also I believe we already document that it's a bad combination. We could log a warning when a node receives a reconfig via hearbeat that sets PV1 to PV0 when enableMajorityReadConcern is true.

Option (5) would be to not fail at all and just log warnings in all of these cases. alyson.cabral, is this what was originally requested?

I worry about the risk involved with Options (1) and (3) landing in a dot release on what are already very stable branches. I also worry about the investment it will take for what I see as little gain. (2) feels like an undesirable user experience.

Comment by Eric Milkie [ 23/Jul/19 ]

I'm okay with (3), but you should be more precise. Since only one of "PV" and "read-concern-majority" settings is actually changeable at runtime, what (3) really is is:
3. On a primary, return an error for a reconfigure that sets PV1 to PV0 when enableReadConcernMajority is true for that node.
On a non-primary, when receiving a reconfigure that changes pv1 to pv0 and the node is set to readConcernMajority:true, pretend that readConcernMajority is set to false. Important note: the code already does this today. The problem is that in the transition from enableReadConcernMajority:true to false, we do not clean up the existing named snapshots. (There could be other problems as well.)

Since option 3 is simply an extension of option 1 and is thus more work, for a seemingly small benefit, I prefer option 1.

Comment by Siyuan Zhou [ 22/Jul/19 ]

milkie, we discussed this on standup meeting. Some options:
1) Make PV0 omit enableMajorityReadConcern silently without any performance penalties as you suggested.
2) Ban the combination on primary and on startup and invariant on secondaries. A misconfigure on a mixed enableMajorityReadConcern flags may crash the whole replset, but the testing surface is much smaller.
3) Ban the combination on primary and omit it on secondaries as in (1). Users will be surprised on restart later.

A crazier idea I just came up with: (2) + propagate this "enableMajorityReadConcern" setting through heartbeats and check them on primary. This is on best-effort basis and should catch most of the issues but requires a real surgery in old versions that we really want to avoid.

I prefer option (3).

Comment by Ratika Gandhi [ 22/Jul/19 ]

Siyuan will discuss with Storage team and Aly. 

Comment by Eric Milkie [ 16/Jul/19 ]

How are you going to enforce this ban?
In particular, if a secondary is running pv1 and eMRC:true, and it receives an oplog entry to switch to pv0, what is the proposed behavior?

Comment by Siyuan Zhou [ 16/Jul/19 ]

No we don't. The shown readConcernMajorityOpTime under PV0 advances but has term -1.

Comment by Alyson Cabral (Inactive) [ 16/Jul/19 ]

Do we warn against this today?

Generated at Thu Feb 08 04:59:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.