[SERVER-36417] Drop pooled connections to nodes no longer in the replica set after a reconfig Created: 02/Aug/18  Updated: 08/Jan/24

Status: Blocked
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Mira Carey Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 1
Labels: carry-over, sa-groomed
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-41031 After an unreachable node is added an... Open
depends on SERVER-36415 Add dropConnections(const HostAndPort... Closed
depends on SERVER-59142 Add dropConnections API to TaskExecutor Closed
Related
is related to SERVER-70297 Do not respond to heartbeat from remo... Closed
Assigned Teams:
Replication
Sprint: Service Arch 2021-07-12, Service Arch 2021-08-09, Replication 2021-11-15, Replication 2021-11-29, Replication 2021-12-13, Replication 2022-01-10, Replication 2022-01-24, Replication 2022-02-07
Participants:
Case:
Linked BF Score: 3
Story Points: 2

 Description   

After a repl set reconfig, drop pooled connections to the removed node.

This would allow removal of a node, changes to host name resolution, and adding the node back with a new ip. Without changes, this either requires a long period to allow connection pools to age out, or manual intervention (if/when we implement SERVER-36416)

 

Acceptance Criteria: 

Investigate the real level of effort for this ticket in the 'Investigating' status and if we provide replication with tools that they need to implement this behavior and add a comment with the amount of work required for this ticket. 



 Comments   
Comment by Ali Mir [ 06/Oct/22 ]

The BFs that occur due to this outstanding heartbeat issue described above are now hot due to frequency. I've filed a ticket (SERVER-70297) to address the workaround in the replication heartbeat code to prevent redness. I'm leaving this ticket as is, because the workaround ticket is implementing separate logic (I've marked them related though).

Comment by George Wangensteen [ 28/Oct/21 ]

judah.schvimer Yup it is! If any of the context from the above comments need clarifying/anyone wants to discuss it feel free to ping me. 

Comment by Judah Schvimer [ 28/Oct/21 ]

george.wangensteen, can you please confirm that this in unblocked?

Comment by George Wangensteen [ 05/Aug/21 ]

Ok, we've decided to go with option (1), and add dropConnections to the TaskExecutor API. I've filed SERVER-59142  to track this service-arch work which should be quick. Then the rest of this ticket (calling dropConnections on the correct HostAndPort(s) at the right place in the reconfig process) should be done by repl, so I'm assigning this to their backlog. 

 

For the repl team: see my comments above for the full context, but basically after SERVER-59142 is completed you'll just need to call dropConnections at the right place at the conclusion of the reconfig process to have hosts in the repl set drop connections to hosts removed from the set (and have removed hosts drop their connections to nodes in the set, as well). In the comments above I've sketched out approximately where in the reconfig code this should be possible.

Comment by Mira Carey [ 05/Feb/20 ]

PM-1519 introduces the client side support for ismaster with process id. Marking this ticket as dependent on that work.

We'll have to see after that project wraps if this fell out naturally, or if there's still a small amount of work left to do

Generated at Thu Feb 08 04:43:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.