[SERVER-82594] Verify IndexBuildCoordinator behavior during config change. Created: 30/Oct/23  Updated: 05/Dec/23  Resolved: 04/Dec/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Matthew Russotto Assignee: Gregory Noma
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-82259 replCoord->findConfigMemberByHostAndP... Closed
is related to SERVER-83857 Allow index builds to be resumable on... Closed
Assigned Teams:
Storage Execution
Sprint: Execution Team 2023-11-27, Execution Team 2023-12-11
Participants:

 Description   

The IndexBuildCoordinator obtains the Replication Coordinator member configuration using replCoord->findConfigMemberByHostAndPort() in two places. Since the configuration could change before this call returns, it's not clear if this is conceptually a safe thing to do. We should verify that the IndexBuildCoordinator behaves correctly during configuration changes.

(There's currently an actual race, which will be fixed by SERVER-82259, but the issue of stale config data remains)



 Comments   
Comment by Gregory Noma [ 04/Dec/23 ]

Thanks to the removal of the PBWM, I believe (2) is also safe since the issue fixed by SERVER-50519 should no longer be possible. Thus even if the node becomes a non-voter after the check, there shouldn't be any issues. But I did file SERVER-83857 to look into removing the restriction and the usage of ReplicationCoordinator::findConfigMemberByHostAndPort_deprecated.

Comment by Gregory Noma [ 17/Nov/23 ]

There are two usages of ReplicationCoordinator::findConfigMemberByHostAndPort_deprecated in the index builds coordinator:

  1. IndexBuildsCoordinatorMongod::voteAbortIndexBuild
  2. isIndexBuildResumable

(1) should be safe because it only checks whether the node is an arbiter, and converting a node to/from an arbiter requires restarting the node and removing it from the replica set. (2) requires a bit more thought because it checks whether the node is a voter, which can change with a reconfig. It uses this to determine whether the index build should be resumable or not: if the node it not a voter, then the index build cannot be resumable. This restriction was added as a part of SERVER-50519.

Generated at Thu Feb 08 06:49:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.