Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-67716

Clarify policy for when responses to hedged requests should cancel outstanding requests

    XMLWordPrintableJSON

Details

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None
    • Service Arch

    Description

      When an operation is 'hedged', mongos sends copies of that operation to multiple mongod nodes. The purpose of this is to 'hedge' the operation - in case one of the mongods cannot or is slow to respond, we may get a response back from another. This avoids slow queries due to a slower mongod and avoids us needing to retry the  operation after a timeout if one mongod is not responding.

      Today, when we receive a response back from one mongods for a hedged operation, we will often cancel the outstanding operation on the other mongod, even if the response we received was an error. The only errors for which we won't cancel outstanding operations are maxTimeMS expired and stale sharding config errors. This may be the correct policy, but it also may result in us cancelling operations that may succeed and forcing lengthier retries, preventing us from getting any benefit from hedging. This policy is also opaque and hard-coded into the networking layer, and is not configurable by consumers of the API. 

       

      We should clarify what responses to a hedge operation should result in us cancelling outstanding hedged requests, and which should cause us to continue waiting. Once we have this policy, we should consider making it configurable on a per-request basis for consumers of the hedging API.

      Attachments

        Activity

          People

            backlog-server-servicearch Backlog - Service Architecture
            george.wangensteen@mongodb.com George Wangensteen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: