[SERVER-48235] The primary node should use the AsyncDBClient to vote for committing the index build to allow the request to be interrupted by the IndexBuildsCoordinator Created: 15/May/20  Updated: 29/Oct/23  Resolved: 15/May/20

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: 4.4.0-rc7, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
related to SERVER-48344 add internal authentication to single... Closed
related to SERVER-48516 at startup, confirm replica set node ... Closed
is related to SERVER-48123 voteCommitIndexBuild() can hang waiti... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Execution Team 2020-05-18, Execution Team 2020-06-01
Participants:
Linked BF Score: 19

 Description   

The primary node will short circuit in the runCmdOnPrimaryAndAwaitResponse if the function is called on the primary itself when voting to commit the index build. However, with the diverging code path, it becomes increasingly more difficult when we have to interrupt the command in the event the commitIndexBuild or abortIndexBuild oplog entry is processed prior to when the node stepped up.

The IndexBuildsCoordinator setSignalAndCancelVoteRequestCbkIfActive() function only cancels the requests started using the AsyncDBClient and not the requests using the DBDirectClient. Not interrupting the DBDirectClient will cause the voteCommitIndexBuild command to hang indefinitely as witnessed in a build failure.

To remediate this issue, when the primary node is voting to commit the index build it should use the AsyncDBClient, which has additional networking overhead but the cost of the additional overhead is negligible when looking at the whole index build process.



 Comments   
Comment by Gregory Wlodarek [ 01/Jun/20 ]

mark.callaghan, from my understanding, the backwards compatibility field is used for minor and major releases, not release candidates. Prior to 4.4, this feature didn't exist, so you shouldn't need to set the --keyFile parameter. We'll triage the follow-up ticket (SERVER-48516) and let you know the course of action, thanks for investigating this!

Comment by Mark Callaghan (Inactive) [ 01/Jun/20 ]

I don't get how this is backwards compatible. Prior to rc7 I didn't have to set --keyFile for single-node replicasets – not in 4.4.0 <= rc6, not prior to 4.4.

Comment by Githook User [ 15/May/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-48235 The primary node should use the AsyncDBClient to vote for committing the index build to allow the request to be interrupted by the IndexBuildsCoordinator

(cherry picked from commit e4e8a7338834ef224b4d681e7d216a49fb322bfa)
Branch: v4.4
https://github.com/mongodb/mongo/commit/1d5d11155689d29bb7de42ccb5a5f4b3c7247469

Comment by Githook User [ 15/May/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-48235 The primary node should use the AsyncDBClient to vote for committing the index build to allow the request to be interrupted by the IndexBuildsCoordinator
Branch: master
https://github.com/mongodb/mongo/commit/e4e8a7338834ef224b4d681e7d216a49fb322bfa

Generated at Thu Feb 08 05:16:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.