[SERVER-57893] Make rsm_horizon_change.js resilient to network failures Created: 21/Jun/21  Updated: 29/Oct/23  Resolved: 22/Jun/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.0-rc4, 4.4.9, 5.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: George Wangensteen Assignee: George Wangensteen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-49435 uassert in NetworkInterfaceTL::setTim... Closed
Related
is related to SERVER-62881 Make rsm_horizon_change.js unknown se... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0, v4.4
Sprint: Service Arch 2021-06-28
Participants:

 Description   

The rsm_horizon_change.js expects to see a specific log line here: https://github.com/mongodb/mongo/blob/e706abcecab992d5b2bf7f1806a90bd92e860c2d/jstests/noPassthrough/rsm_horizon_change.js#L30 after a split-horizon reconfig. Specifically, it expects to see this log line with topologyType ReplicaSetNoPrimary and type Unknown to indicate that the split-horizon reconfig initially results in an unknown server description.

Currently, this log line is emitted because StreamableReplicaSetMonitor::onTopologyDescriptionChangedEvent is called after TopologyManager::onServerDescription is called here after the RSM receives an error response from the remote node after the reconfig. But this code path is only reached if, after receiving the error response, the helloOutcome for the response is set here.

But, this hello response is only set if the received error is not a network error. Before SERVER-49435, we erroneously translate network errors into CommnandResultSchemaViolation in the NetworkInterfaceTL here by calling getStatusFromCommandResult on the response without checking the response's internal status. This results in the RSM correctly emitting the log line via the code path described above. But after fixing this and correctly propogating network errors like HostUnreachable from the networkInterface, the RSM will no longer set the helloResponse after recieving the error (because it is a network error) and instead of emitting the expected log line, will drop connections to the remote it recieved and error from and monitor the RS in expedited mode until it detects a primary.

Note that this behavior is still correct – the host that recieves the network error will simply monitor the RS until it has a new primary, and then continue as usual. We just need to allow the test to accept the alternate log line in this case.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 05/Aug/21 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-57893 Make rsm_horizon_change.js resilient to network failures

(cherry picked from commits 82ad45c958c2fc020c808254dbd19072a225113d and d378bdd1e6b8e170aabb8f4f089b74481ed0bf1a)
Branch: v4.4
https://github.com/mongodb/mongo/commit/8157ede4a9ca52c52fba9627f1f718e30153d7ce

Comment by Githook User [ 23/Jun/21 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-57893 Make rsm_horizon_change.js resilient to network failures

(cherry picked from commits 82ad45c958c2fc020c808254dbd19072a225113d and d378bdd1e6b8e170aabb8f4f089b74481ed0bf1a)
Branch: v5.0
https://github.com/mongodb/mongo/commit/f9bed91448c7a6f1bd1681365f09fd0767efb21f

Comment by Githook User [ 22/Jun/21 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-57893 Fix regular expression in rsm_horizon_change.js
Branch: master
https://github.com/mongodb/mongo/commit/d378bdd1e6b8e170aabb8f4f089b74481ed0bf1a

Comment by Githook User [ 22/Jun/21 ]

Author:

{'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}

Message: SERVER-57893 Make rsm_horizon_change.js resilient to network failures
Branch: master
https://github.com/mongodb/mongo/commit/82ad45c958c2fc020c808254dbd19072a225113d

Generated at Thu Feb 08 05:43:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.