Uploaded image for project: 'C Driver'
  1. C Driver
  2. CDRIVER-1571

Spurious topology-scanner timeouts when using stream initiator

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.3.6, 1.5.0
    • Affects Version/s: 1.2.0
    • Component/s: libmongoc, network
    • Labels:
      None

      Async "ismaster" commands started before a slow call to a blocking stream initiator have the initiator's duration subtracted from their timeouts. The initiator and the async commands use the same initial timeout value, so once an initiator times out, all async commands started beforehand now have 0 seconds remaining. They're canceled before they run.

      Diagnosed with the PHP Driver 1.1.8 and C Driver 1.3.5. The new PHP Driver will no longer use custom stream initiators. Another library could theoretically use them and hit this bug.

      Scenario: create a single client with URI "mongodb://host1,host2/?replicaSet=rs". host1 is up, host2 is unresponsive.

      With mongoc_client_set_stream_initiator set a custom stream initiator that does a blocking initial connect. Begin an application operation to trigger server selection and a topology scan.

      In mongoc_topology_scanner_start:

      1. Create a stream for host1
      2. Call mongoc_async_cmd_new at Time 0
      3. mongoc_async_cmd_new sets expiration for calling ismaster on host1 to Time 0 + 10 seconds
      4. Create a stream for host2 - this blocks for 10 seconds and fails
      5. The time is now after Time 0 + 10 seconds

      Proceed to _mongoc_topology_run_scanner:

      1. The async command for host1 is already expired, mark it failed
      2. We never created an async command for host2 since its initiator timed out

      The C Driver's error is like:

      No suitable servers found (`serverselectiontryonce` set): [connection timeout calling ismaster on 'host1:27017'] [Failed connecting to 'host2:27017': Connection timed out]
      

      The part of the error message about host1 comes from the topology scanner, it's created when the async command is canceled. The part about host2 comes from the custom stream initiator, in this case I've copied the error message formatted by the PHP Driver's custom stream initiator.

      In the particular scenario that led to me diagnosing this bug, the C Driver also suffered from CDRIVER-1567 so the first part of the message wrongly said "connection error", but it should have said "connection timeout".

      The solution is to stop tracking per-command timeouts; they're intended for a future full-async driver that we've decided not to implement. We already track a timeout for _mongoc_topology_run_scanner, which does what we need now.

            Assignee:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: