[SERVER-55439] local sharding jstests pass/fail depending on Internet Service Provider Created: 23/Mar/21  Updated: 06/Dec/22  Resolved: 17/May/21

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Billy Donahue Assignee: Backlog - Security Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-55579 jstest failures from DNS failing to l... Closed
Assigned Teams:
Server Security
Operating System: ALL
Participants:
Linked BF Score: 136

 Description   

My laptop is plugged directly into my cable modem to fix Zoom issues.
Now I can't run local JSTests on my laptop. Our JStests are have made some pretty weird choices in how they set themselves up. The --suite=sharding test replica set processes bind to the EXTERNAL-FACING ip addresses of the local host! That's already a problem. This means that during the duration of my jstest, I'm serving one or more hacked up no-auth mongod test-command-enabled instances to the actual whole world, on the well-known mongod port. I think my OS will block incoming but still! That's what the entire localhost 127/8 subnet is for.

Here's the RSConfig lines logged by the test:

[js_test:resharding_metrics] ReplSetTest made initial connection to node: connection to aa-bb-cc-dd.isp.com:20023
[js_test:resharding_metrics] ReplSetTest startSet, nodes: [
[js_test:resharding_metrics] 	connection to aa-bb-cc-dd.isp.com:20022,
[js_test:resharding_metrics] 	connection to aa-bb-cc-dd.isp.com:20023
[js_test:resharding_metrics] ]
[js_test:resharding_metrics] ReplSetTest startSet took 6235ms for 2 nodes.
[js_test:resharding_metrics] Waiting for the config server to finish starting up.

That's bad enough.

But why won't the test work? Because my ISP has just kind of assigned me this hostname "aa-bb-cc-dd.isp.com" with DHCP but there's no DNS behind it. If I try to connect to aa-bb-cc-dd.isp.com by name, it doesn't exist (NXDOMAIN).

$ hostname
aa-bb-cc-dd.isp.com
$ host $(hostname)
Host aa-bb-cc-dd.isp.com not found: 3(NXDOMAIN)

This is very fragile. If I unplug the ethernet cable I can run jstests again. The cluster hosts can't find each other despite being started by the same launcher at the same time on the same host, and all because of some DHCP settings from my ISP.

That's a problem. This is several problems all at once, really.

(not really sure which team gets this)



 Comments   
Comment by Billy Donahue [ 23/Mar/21 ]

Possible idea to mitigate?:

Jstest code is creating these RSConfigs that don't work, and shipping them off to the mongo servers in the test cluster. Before it does so, it can do its own DNS query to sanity check the hostnames and see if the test has any hope of succeeding.

Spawned mongod and mongos could be instructed by the test driver which IP address to bind to. Definitely not 0.0.0.0 (all addresses!) by default unless the test specifically needs that.

Offer to patch /etc/hosts to support local tests?

Generated at Thu Feb 08 05:36:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.