[SERVER-47045] Add tests to check that mongos marks a mongod as failed in failure cases Created: 23/Mar/20  Updated: 29/Oct/23  Resolved: 20/Apr/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.4.0-rc3, 4.7.0

Type: Task Priority: Major - P3
Reporter: Janna Golden Assignee: Janna Golden
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-47932 doNotSetMoreToCome failpoint should e... Investigating
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4
Sprint: Sharding 2020-04-06, Service arch 2020-04-20, Service arch 2020-05-04
Participants:

 Description   

Add the following three test cases to check that mongos correctly marks a mongod as down (marks its type as unknown in its server description):

1. After mongos receives one isMaster response from a node, cease all isMaster replies on this node. MongoS should mark this node as down after connectTimeoutMS + maxAwaitTimeMS.
2. After mongos receives one isMaster response from a node, kill the node.
3. After mongos receives one isMaster response from a node, make mongod respond with 'ok : 0' in all future isMaster replies.

Add a fourth test case to check that mongos does not mark a mongod as down if mongod responds with 'ok : 1' but without the 'moreToCome' bit set. This test should verify that the client does not mark the server unknown and continues monitoring the mongod after it ends the isMaster stream.



 Comments   
Comment by Githook User [ 27/Apr/20 ]

Author:

{'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}

Message: SERVER-47045 Add tests to check that the RSM behaves correctly when contacting a mongod fails for various reasons

(cherry picked from commit ae194cbab3b84a145d2e5b585c4dd0a261830675)
Branch: v4.4
https://github.com/mongodb/mongo/commit/bf9e79d89f2d9fd28526b7b8c056577676bfcd5d

Comment by Githook User [ 20/Apr/20 ]

Author:

{'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}

Message: SERVER-47045 Add tests to check that the RSM behaves correctly when contacting a mongod fails for various reasons
Branch: master
https://github.com/mongodb/mongo/commit/ae194cbab3b84a145d2e5b585c4dd0a261830675

Comment by Shane Harvey [ 23/Mar/20 ]

Some notes on each test:
1) Can use the waitInIsMaster failpoint (SERVER-44814).
2) Can use the shutdown command (or kill with a signal).
3) Can use failCommand to fail future isMaster responses. From my local testing, the failCommand failpoint will not be triggered immediately but on the next maxAwaitTimeMS timeout.
4) Will require a new failpoint. Perhaps a failpoint that ends all isMaster streams (ok:1 and moreToCome=False)? Should the failpoint should be triggered immediately or on the next maxAwaitTimeMS timeout (like failCommand)?

Generated at Thu Feb 08 05:13:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.