[SERVER-56854] Provide the ability for RSM requests to timeout and mark the server as failed Created: 11/May/21  Updated: 29/Oct/23  Resolved: 28/May/21

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: None
Fix Version/s: 4.0.25, 4.2 Required

Type: New Feature Priority: Critical - P2
Reporter: Lamont Nelson Assignee: Lamont Nelson
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-56917 Stuck Hello request may lead to clust... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2, v4.0
Sprint: Sharding 2021-05-17
Participants:
Case:

 Description   

Currently, a mongos node can send a hello request to replica set members and not hear a response indefinitely. In this case, the operation will not return until the connection on the mongos side has a timeout, which could be several minutes based on TCP keepalive settings.

This ticket is to create an application timeout mechanism that allows the RSM to make progress monitoring other nodes in the presence of TCP blackholes or similar network failures. The timeout should be on the order of seconds to ensure cluster availability.



 Comments   
Comment by Githook User [ 28/May/21 ]

Author:

{'name': 'LaMont Nelson', 'email': 'lamont.nelson@mongodb.com', 'username': 'lamontnelson'}

Message: SERVER-56854: Use executor and enforce timeout when making hello requests in the ReplicaSetMonitor
Branch: v4.0
https://github.com/mongodb/mongo/commit/8ddcaf878b5600fdac322929cd337c6f4563bf19

Comment by Lamont Nelson [ 28/May/21 ]

https://mongodbcr.appspot.com/771350005/

Generated at Thu Feb 08 05:40:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.