ShardSvrDropIndexes() should not skip validated shards on retries during concurrent chunk migrations

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Duplicate
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • ALL
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None

      SERVER-104721 caught a bug in shardsvr_drop_indexes_command.cpp. Order of events that led to hitting this assertion:

      1 . On the first dropIndexes() attempt, we target Shard0 and Shard1.
      2. We send requests to both shards, Shard0 returns OK() and is excluded from targeting on the next retry.
      3. The shardVersionRetry helper refreshes the CatalogCache, so that we get up-to-date routing tables for Shard1 on the next retry.
      4. However, as a chunk migration was happening concurrently in the background, it's possible that the requested range was moved out of Shard1.
      5. So, when we call scatterGatherVersionedTargetByRoutingTableNoThrowOnStaleShardVersionErrors again, Shard0 is excluded, and the updated cached routing table tells us that Shard1 doesn't own data for our requested range anymore.
      6. Thus, when we build requests to send to the shards, requests is empty (getShardIdsForQuery no longer includes Shard1), so we don't have any responses and don't go through nss validation here.

      This is a bug as we now end up in a case where Shard1 has no indexes, but Shard0 does (migrated from Shard1). The operation previously could have reported a success even though the dropIndexes() command wasn't carried out correctly across all shards.

      If there are no shardResponses but we've stored shards in shardsWithSuccessResponses, the vector of successful shards to skip on the next retry should be cleared along, and the CatalogCache should be refreshed on the next access.

            Assignee:
            Unassigned
            Reporter:
            Lynne Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: