Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-85145

ShardNotFound error should not be bubbled up when concurrently removing a shard and running operations

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None
    • Catalog and Routing
    • ALL

    Description

      In HELP-54194, we discovered that there are some commands that may fail when a removeShard is taking place / draining a shard. As an example “listIndexes”. It is expected to hit ShardNotFound as a transient error triggered by a specific timing and in a specific window of time, and bubble up to the user application. The command failed can be perfectly retried and successfully executed after that. The exact reproducible test is attached to the comments.

      The problem is that the user will be able to see ShardNotFound bubble up when it may not be necessary, i.e. the mongos or driver (implementation decision) should retry the operation. 

      Summarizing, the goal of this ticket is to list all the commands triggered by the reproducible and investigate / work on a feasible solution to retry ShardNotFound without bubbling up to the user when is not necessary - as we do with other transient errors.

      Attachments

        Activity

          People

            backlog-server-catalog-and-routing Backlog - Catalog and Routing
            pol.pinol@mongodb.com Pol Pinol
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: