[SERVER-54926] Convert HostUnreachable error in _fetchAndStoreRecipientClusterTimeKeyDocs to specific error Created: 03/Mar/21  Updated: 29/Oct/23  Resolved: 06/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0-rc1, 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Jason Zhang Assignee: Jason Zhang
Resolution: Fixed Votes: 0
Labels: pm-1791_non-cloud-blocking, pm-1791_other_required
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-52713 [testing] Add stepdown/kill/terminate... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.9
Sprint: Sharding 2021-04-19, Sharding 2021-05-03, Sharding 2021-05-17
Participants:

 Comments   
Comment by Githook User [ 07/May/21 ]

Author:

{'name': 'Jason Zhang', 'email': 'jason.zhang@mongodb.com', 'username': 'jz1242'}

Message: SERVER-54926 Convert HostUnreachable error in _fetchAndStoreRecipientClusterTimeKeyDocs to specific error

(cherry picked from commit 1814c0324af4fb1421ffa46d90528d2053dcac2c)
Branch: v4.9
https://github.com/mongodb/mongo/commit/97200c5925a007f02c952166a68bc870e1e9ff85

Comment by Githook User [ 05/May/21 ]

Author:

{'name': 'Jason Zhang', 'email': 'jason.zhang@mongodb.com', 'username': 'jz1242'}

Message: SERVER-54926 Convert HostUnreachable error in _fetchAndStoreRecipientClusterTimeKeyDocs to specific error
Branch: master
https://github.com/mongodb/mongo/commit/1814c0324af4fb1421ffa46d90528d2053dcac2c

Comment by Jason Zhang [ 06/Apr/21 ]

Yeah this error needs to be refined for non-retriable SSL issues since it currently returns HostUnreachable for any connection issue. Our code currently doesn't retry any HostUnreachable error inside _fetchAndStoreRecipientClusterTimeKeyDocs as a temporary stopgap to allow our ssl jstests to pass. This behavior will change as a result of this ticket and HostUnreachable will be retriable again.

Comment by Lingzhi Deng [ 06/Apr/21 ]

Is this a bug? If HostUnreachable is not retryable, I think we could mistakenly fail a migration on recipient failovers. I think I saw this in a patch build when I try to terminate the recipient primary during migration. See this log. Here's my debug patch build.

Generated at Thu Feb 08 05:34:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.