Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-67465

Ensure timeouts do not fail hedged operations

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.17, 6.2.0-rc0
    • Affects Version/s: None
    • Component/s: Internal Code
    • None
    • Fully Compatible
    • ALL
    • v4.4
    • Hide
      • Build mongo binaries from the v4.4 branch.
      • Run the following repeatedly until it fails (usually fails within the first 20 runs):
        ./buildscripts/resmoke.py run --suite=sharding_continuous_config_stepdown jstests/sharding/hedged_reads.js
        
      Show
      Build mongo binaries from the v4.4 branch. Run the following repeatedly until it fails (usually fails within the first 20 runs): ./buildscripts/resmoke.py run --suite=sharding_continuous_config_stepdown jstests/sharding/hedged_reads.js
    • Service Arch 2022-07-25, Service Arch 2022-08-08, Service Arch 2022-08-22, Service Arch 2022-09-05
    • 0

      A hedged operation that is failed due to a NetworkInterfaceExceededTimeLimit might cause the original operation to fail. Consider the following as an example (reproducible on v4.4):

      • Mongos attempts to hedge a read operation.
      • The hedged operation, running on a shard server, needs to query the config server (e.g., as part of waitForReadConcern).
      • The config server is temporarily unavailable (e.g., a step-down is in progress), thus it cannot accept new connections.
      • Querying the config-server times out for the hedged operation (i.e., NetworkInterfaceExceededTimeLimit).
      • The hedged operation completes and returns the time-out error to the mongos server.
      • Since the error is not MaxTimeMSExceeded, mongos kills the outstanding operation and returns the non-okay status to the caller (see here).
      • The operation fails, while it would have (eventually) succeeded without hedging.

      This ticket, or its sub-tasks, should:

      • Check if this issue also applies to newer branches (post v4.4).
      • Clarify the semantics for failing hedged operations (e.g., what errors may be ignored on hedged operations).
      • Fix the implementation to honor the semantics.

            Assignee:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Reporter:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: