[SERVER-47517] Secondaries will always try to vote irrespective of the commit quorum (on/off) value. Created: 13/Apr/20  Updated: 29/Oct/23  Resolved: 17/Apr/20

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: 4.4.0-rc2, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-46659 Make initial sync work with two phase... Closed
Related
related to SERVER-47464 Prevent SetIndexCommitQuorum from cha... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Execution Team 2020-04-20, Execution Team 2020-05-04
Participants:

 Description   

This is going to be some follow-up work for SERVER-46659. We need to change the listIndexes command to return the commitQuorum for unfinished indexes.

This invariant holds for startup recovery and rollback recovery as we start from a point-in-time. But for initial sync, this invariant will not hold due to the nature of initial sync. To resolve this, we'll need to get the commitQuorum from the listIndexes command and persist it to disk before calling applyStartIndexBuild() to set up unfinished index builds during the collection cloning phase.



 Comments   
Comment by Githook User [ 17/Apr/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-47517 Secondary nodes will always try to vote for committing the index build, even if the commit quorum is off

(cherry picked from commit 5713af29cd78dd8322c25c8d0b13f78ed6de34ae)
Branch: v4.4
https://github.com/mongodb/mongo/commit/1c1e1b8eb2f0417d243168d83ab3a12ba657175c

Comment by Githook User [ 17/Apr/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-47517 Secondary nodes will always try to vote for committing the index build, even if the commit quorum is off
Branch: master
https://github.com/mongodb/mongo/commit/5713af29cd78dd8322c25c8d0b13f78ed6de34ae

Comment by Suganthi Mani [ 14/Apr/20 ]

SERVER-46659 makes collection cloning phase to start the 2 phase index builds on initial syncing node for the unfinished indexes returned by listIndexes cmd. So, the index build started during collection initial sync cloning phase may not have a on-disk commit quorum value while trying to vote and can hit this invariant. This can happen for below 2 cases.

1) User collection, say, 'foo' which starts the 2 phase index build, gets cloned before 'config' collection cloning.
2) Index build started on sync source after cloning config collection. Below is the sequence

  • Initial sync node finished cloning config db.
  • Index build ‘x_1’ started on sync source for foo collection. This will result in persisting commit quorum value on-disk on sync source.
  • Cloning of foo collection makes the initial sync node to start the index build 'x_1'.
Comment by Suganthi Mani [ 13/Apr/20 ]

Another alternative approach to solve the server crash is that
IndexBuildsCoordinatorMongod::_signalIfCommitQuorumNotEnabled()
1) Only primary (canAcceptWritesFor() is true) should need to read the on-disk commit quorum value to see if they need to vote or not.
2) Non-primary (secondary, initial sync& rollback state) does not need to know the on-disk commit quorum value. Instead it will always try to vote
Now, on the primary - IndexBuildsCoordinatorMongod::voteCommitIndexBuild()
1) persistCommitReadyMemberInfo() will persist the voter’s info only if the commit quorum value is non-zero. (To be noted, SERVER-47464 - Prevent SetIndexCommitQuorum from changing commit quorum on to off & vice versa).
2) It can return to the voter node a non-retryable/acceptable error code if any node try to vote for index build with on-disk commit quorum value as 0 (commit quorum off).

Main Idea: Node doesn't need to know the on-disk commit quorum value unless they are primary.

Generated at Thu Feb 08 05:14:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.