-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Index Maintenance
-
None
-
Fully Compatible
-
ALL
-
v4.4
-
Execution Team 2020-05-18, Execution Team 2020-06-01
-
19
The primary node will short circuit in the runCmdOnPrimaryAndAwaitResponse if the function is called on the primary itself when voting to commit the index build. However, with the diverging code path, it becomes increasingly more difficult when we have to interrupt the command in the event the commitIndexBuild or abortIndexBuild oplog entry is processed prior to when the node stepped up.
The IndexBuildsCoordinator setSignalAndCancelVoteRequestCbkIfActive() function only cancels the requests started using the AsyncDBClient and not the requests using the DBDirectClient. Not interrupting the DBDirectClient will cause the voteCommitIndexBuild command to hang indefinitely as witnessed in a build failure.
To remediate this issue, when the primary node is voting to commit the index build it should use the AsyncDBClient, which has additional networking overhead but the cost of the additional overhead is negligible when looking at the whole index build process.
- is related to
-
SERVER-48123 voteCommitIndexBuild() can hang waiting for its lock acquisition to be granted when there is a stronger lock request waiting ahead of it and when the index build is being aborted
- Closed
- related to
-
SERVER-48344 add internal authentication to single node replica set configuration for performance test
- Closed
-
SERVER-48516 at startup, confirm replica set node with auth can connect to itself
- Closed