Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-76103

hang in transport_test / EgressAsioNetworkingBatonTest / AsyncOpsMakeProgressWhenSessionAddedToDetachedBaton waiting to reach failpoint

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.0-rc6
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v7.0
    • Service Arch 2023-04-17, Service Arch 2023-05-01, Service Arch 2023-05-15, Service Arch 2023-05-29
    • 7

      This causes asio_transport_layer_test.cpp cases to hang indefinitely.
      The case of BFG-1827646 is one such case.
      This is hard to repro as discussed in the closely related SERVER-75876.

      While SERVER-75876 is about an unexpected HostUnreachable (in a prior test case in the same run) that was observed as a preluede to the hang, further investigation showed that the hang can occur without the HostUnreachable occurring, and can occur in single-test "--filter" run of AsyncOpsMakeProgressWhenSessionAddedToDetachedBaton.

      Unfortunately the spawnhost (rhel81) machine's runtime environment seems to be capable of going through a full day of having a reliable repro for this and then without warning the same steps become a non-repro. Very odd.

      After a discussion with Matt Diener about opportunisticRead, we may have some idea of how this can happen, and it may be a bug in that function, whereby the same input bytes can appear more than once in the data stream.
      This would cause the deferred async read (and the associated failpoint) to never need to be executed in this unit test, which would explain the hang and timeouts.

      If that's true and there's a real bug that can cause input data to be delivered twice to a Session, we need to investigate and fix that.

            Assignee:
            billy.donahue@mongodb.com Billy Donahue
            Reporter:
            billy.donahue@mongodb.com Billy Donahue
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: