Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47972

maxTimeMS set on hedged requests does not give shards enough time to refresh

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.0-rc7, 4.7.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Fully Compatible
    • ALL
    • v4.4
    • Service arch 2020-05-18
    • 28

      Goal: investigate if the scenario outlined below is possible and determine a fix. 



      Suppose shard0 has a primary, secondary0, and secondary1.

      • Each time mongos tries to perform 'count' with hedged reads, secondary0 never gets the chance to set the in memory database version because it either gets killedOp'd or timed out of maxTimeMS before it can be set successfully after refreshing.
      • Over the NetworkInterfaceTL, mongos tries to route the next 'count' command to secondary0. Secondary0 has no known databaseVersion, so onDbVersionMismatchNoExcept gets called. A refresh is prompted and errors with maxTimeMS expiration error.
      • Since the maxTimeMS error is ignored, the original "no known dbVersion" error propagates back to the NetworkInterfaceTL. There, since the error reported is not maxTimeMS, the finish line is triggered. Secondary0 wins the race, but the 'count' fails with "don't know dbVersion."

      In this scenario, we believe secondary0 would be getting killed here.

            cheahuychou.mao@mongodb.com Cheahuychou Mao
            haley.connelly@mongodb.com Haley Connelly
            0 Vote for this issue
            5 Start watching this issue