[CDRIVER-1988] Topology scanner times out while trying IPv6 address Created: 11/Jan/17  Updated: 02/May/17  Resolved: 11/Jan/17

Status: Closed
Project: C Driver
Component/s: libmongoc, network
Affects Version/s: 1.5.2
Fix Version/s: 1.5.3

Type: Bug Priority: Blocker - P1
Reporter: Remi Collet Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Fedora 26


Issue Links:
Related
is related to CDRIVER-1972 Support IPv6 only hostnames Closed
Case:

 Description   

Version 1.5.1 builds fine
See QA monitoring: https://apps.fedoraproject.org/koschei/package/mongo-c-driver

Trying to update to 1.5.2, segfault during test suite

{  "results": [
    { "status": "PASS", "test_file": "/TestSuite/version_cmp", "seed": "4072393022", "start": 3743935.968662660, "end": 3743935.968703771, "elapsed": 0.000041111  },
    { "status": "PASS", "test_file": "/Array/Basic", "seed": "2047719654", "start": 3743935.968717476, "end": 3743935.968722149, "elapsed": 0.000004673  },
    { "status": "PASS", "test_file": "/Async/ismaster", "seed": "2177201714", "start": 3743935.968729776, "end": 3743936.783838702, "elapsed": 0.815108926  },
    { "status": "PASS", "test_file": "/Async/ismaster_ssl", "seed": "2632930259", "start": 3743936.783876887, "end": 3743937.494377111, "elapsed": 0.710500224  },
    { "status": "PASS", "test_file": "/Buffer/Basic", "seed": "870497786", "start": 3743937.494416158, "end": 3743937.494446064, "elapsed": 0.000029906  },
    { "status": "SKIP", "test_file": "/Client/authenticate" },
    { "status": "SKIP", "test_file": "/Client/authenticate_failure" },
    { "status": "SKIP", "test_file": "/Client/authenticate_timeout" },
error calling ismaster: 'No suitable servers found: `serverselectiontimeoutms` timed out: [connection refused calling ismaster on 'localhost:27017']'
URI = mongodb://localhost:27017
/bin/sh: line 1:  7804 Aborted                 (core dumped) ./$TEST_PROG "--no-fork" -F test-results.json
make: *** [Makefile:6148: test] Error 134

Full build.log: https://kojipkgs.fedoraproject.org//work/tasks/744/17240744/build.log



 Comments   
Comment by Githook User [ 11/Jan/17 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: Revert "CDRIVER-1972 Support for ipv6 hostnames"

8729c1448782481f392e4b51e513c14bb9736a5b

Fixes CDRIVER-1988.
Branch: r1.5
https://github.com/mongodb/mongo-c-driver/commit/0bce8b476ce0def8e9021b96436e816d2bf933dc

Comment by Githook User [ 11/Jan/17 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: Revert "CDRIVER-1972 Support for ipv6 hostnames"

8729c1448782481f392e4b51e513c14bb9736a5b

Fixes CDRIVER-1988.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/107dbb031d3d77662c2efb3a68997507845df0d1

Comment by Githook User [ 11/Jan/17 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-1988 test with IPv6 disabled
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/49dc46626d2838fc88c585904362aaca960fc622

Comment by A. Jesse Jiryu Davis [ 11/Jan/17 ]

Thanks for reporting and investigating, Remi. I can reproduce this: if I run mongod locally with IPv6 turned off, I fail at the same spot, where test_command() tries to call ismaster on localhost:27017.

We didn't catch this because we habitually pass "--ipv6" to our test mongod instance when we run the tests.

The problem is in mongoc_topology_scanner_node_connect_tcp where we initially discover which servers from the host list are available. There, we call getaddrinfo with AF_UNSPEC (this is a change, it had been AF_INET). On my Mac and most machines, the IPv6 result for "localhost" is returned at the beginning of the results, followed by the IPv4 result. The driver chooses the first result and tries to connect, with a default timeout of 10 seconds. If that fails, it considers the host unavailable. It does *not* attempt to connect using the other results from the getaddrinfo list.

The test passes if mongod is started with --ipv6. However, any tests using mock_server_t fail, since the mock server doesn't listen on IPv6. The such test is test_cooldown_rs(), so that fails once we have mongod listening on IPv6.

Right now we should revert the change, as you proposed.

There are two options for a long-term solution:

1. After a connection times out, we should reset the connect timer and try the next getaddrinfo result.
2. Implement something akin to "Happy Eyeballs": try all addresses returned by getaddrinfo concurrently and use the first connection that succeeds. Cache the address that succeeded (IPv4 or IPv6 or whatever) and use that for subsequent connections to the same host:port until some cache duration times out.

Number 2 shouldn't be very hard, since the topology scanner is already parallel across multiple hosts, it just needs to become parallel across multiple addresses for each host.

Comment by Remi Collet [ 11/Jan/17 ]

Final try: simply revert https://github.com/mongodb/mongo-c-driver/commit/333cbc2cd2f54f3650f51c39a2490c28c355cc0f
Everything works.

Definitively, this doesn't seems enough for IPv6 support.

Comment by Remi Collet [ 11/Jan/17 ]

Another try, skipping slow tests:

     test-libmongoc: tests/test-mongoc-topology.c:682: _test_select_succeed: Assertion `request' failed.

Another try, setting test_select_after_try_once and test_select_after_timeout as slow:

        { "status": "PASS", "test_file": "/TLS/hangup", "seed": "2701036614", "start": 3752245.181399700, "end": 3752245.203285149, "elapsed": 0.021885449  },
     expected timeout after about 200ms, not 14
     /bin/sh: line 1: 86268 Aborted                 (core dumped) ./$TEST_PROG "--no-fork" -F test-results.json

Looking at 1.5.1/1.5.2 diff, saw very little changes... but huge effects...

Comment by Remi Collet [ 11/Jan/17 ]

Running mongod server with --ipv6 allow to got further, but

    { "status": "PASS", "test_file": "/Topology/cooldown/standalone", "seed": "214294207", "start": 2648984.974729814, "end": 2648991.375096013, "elapsed": 6.400366199  },
test-libmongoc: tests/test-mongoc-topology.c:572: test_cooldown_rs: Assertion `request' failed.
/bin/sh: line 1:  8828 Aborted                 (core dumped) ./$TEST_PROG "--no-fork" -F test-results.json

Comment by Remi Collet [ 11/Jan/17 ]

Tyring a local build (on fedora 25, with bundled libbson), fails later

{{

{ "status": "PASS", "test_file": "/Uri/write_concern", "seed": "712961548", "start": 7903.330626548, "end": 7903.330680703, "elapsed": 0.000054155 }

,
lt-test-libmongoc: tests/test-mongoc-uri.c:546: test_mongoc_host_list_from_string: Assertion `host_list.family == AF_INET' failed.
/bin/sh : ligne 1 : 18528 Abandon (core dumped)./$TEST_PROG "--no-fork" -F test-results.json
}}

Generated at Wed Feb 07 21:13:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.