[SERVER-65059] Election during split may result in inability to observe recipient split acceptance Created: 29/Mar/22  Updated: 29/Oct/23  Resolved: 14/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Mathis Bessa Assignee: Matt Broadstone
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Server Serverless 2022-04-18
Participants:

 Description   

In the following scenario :

  1. ShardSplitDonorService commits but steps down before being marked for garbage collection.
  2. Step up with local state as `kCommitted` but there is no longer recipient nodes in the set.

The Replica Set Monitor fails to monitor the recipient nodes because they were removed from the config and the recipient connection string can no longer be built which results in connection failures such as :

[js_test:shard_split_startup_recovery] d20021| {"t":

{"$date":"2022-03-29T21:50:48.952+00:00"}

,"s":"I",  "c":"-",        "id":4333222, "ctx":"ShardSplitDonorService-3","msg":"RSM received error response","attr":{"host":"ip-10-122-14-181:20023","error":"HostUnreachable: Connection re     fused","replicaSet":"","response":{}}}

 

While this was observed in the `kCommitted` case, this can also happen while the donor waits for the recipient to accept the split. If at this point an election occurs and a donor secondary steps up to continue the split operation, it will not observe the split acceptance since the connection string cannot be built.



 Comments   
Comment by Githook User [ 02/Apr/22 ]

Author:

{'name': 'Matt Broadstone', 'email': 'mbroadst@mongodb.com', 'username': 'mbroadst'}

Message: SERVER-65059 Store recipient connection string in state document
Branch: master
https://github.com/mongodb/mongo/commit/c78ba79626722ae69ea9b64762ffd1dc075ce960

Generated at Thu Feb 08 06:01:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.