Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53550

Resharding coordinator doesn't handle possibility of QueryPlanKilled error when triggering refresh

    • Fully Compatible
    • ALL
    • Sharding 2021-01-11
    • 19
    • 1

      Hit this issue regularly on Windows DEBUG when trying to enable the resharding_fuzzer_shuffle in SERVER-53267. The config.cache.chunks.test_reshard.reshard_coll collection is being dropped by the donor and recipient shards as a result of the collection epoch for test_reshard.reshard_coll being changed. The _flushRoutingTableCacheUpdatesWithWriteConcern command sent by the resharding coordinator can end up failing with a QueryPlanKilled error response if the shard was already in the process of dropping the config.cache.chunks.test_reshard.reshard_coll collection due to a refresh having been triggered already by something else.

      [js_test:resharding_fuzzer-505f-1609523439870-1] 2021-01-01T17:52:12.712+0000 d20026| 2021-01-01T17:52:12.713+00:00 I  COMMAND  518070  [ShardServerCatalogCacheLoader::runCollAndChunksTasks] "CMD: drop","attr":{"namespace":"config.cache.chunks.test_reshard.reshard_coll"}
      ...
      [js_test:resharding_fuzzer-505f-1609523439870-1] 2021-01-01T17:52:12.730+0000 d20026| 2021-01-01T17:52:12.730+00:00 I  STORAGE  20314   [ShardServerCatalogCacheLoader::runCollAndChunksTasks] "dropCollection: storage engine will take ownership of drop-pending collection","attr":{"namespace":"config.cache.chunks.test_reshard.reshard_coll","uuid":{"uuid":{"$uuid":"aee9e556-c159-472d-88ac-be15f1f66266"}},"dropOpTime":{"ts":{"$timestamp":{"t":0,"i":0}},"t":-1},"commitTimestamp":{"$timestamp":{"t":0,"i":0}}}
      ...
      [js_test:resharding_fuzzer-505f-1609523439870-1] 2021-01-01T17:52:12.731+0000 d20026| 2021-01-01T17:52:12.732+00:00 W  QUERY    23798   [ShardServerCatalogCacheLoader::getChunksSince] "Plan executor error during find command","attr":{"error":{"code":175,"codeName":"QueryPlanKilled","errmsg":"collection dropped. UUID aee9e556-c159-472d-88ac-be15f1f66266"},"stats":{"stage":"FETCH","nReturned":1,"works":1,"advanced":1,"needTime":0,"needYield":0,"saveState":1,"restoreState":1,"isEOF":0,"docsExamined":1,"alreadyHasObj":0,"inputStage":{"stage":"IXSCAN","nReturned":1,"works":1,"advanced":1,"needTime":0,"needYield":0,"saveState":1,"restoreState":1,"isEOF":0,"keyPattern":{"lastmod":1},"indexName":"lastmod_1","isMultiKey":false,"multiKeyPaths":{"lastmod":[]},"isUnique":false,"isSparse":false,"isPartial":false,"indexVersion":2,"direction":"forward","indexBounds":{"lastmod":["[Timestamp(0, 0), Timestamp(4294967295, 4294967295)]"]},"keysExamined":1,"seeks":1,"dupsTested":0,"dupsDropped":0}},"cmd":{"find":"cache.chunks.test_reshard.reshard_coll","filter":{"lastmod":{"$gte":{"$timestamp":{"t":0,"i":0}}}},"sort":{"lastmod":1},"readConcern":{},"$db":"config"}}
      [js_test:resharding_fuzzer-505f-1609523439870-1] 2021-01-01T17:52:12.732+0000 d20026| 2021-01-01T17:52:12.732+00:00 I  STORAGE  20320   [ShardServerCatalogCacheLoader::runCollAndChunksTasks] "createCollection","attr":{"namespace":"config.cache.chunks.test_reshard.reshard_coll","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"55eb32c1-524e-4e3c-ab90-d17111d05350"}},"options":{}}
      [js_test:resharding_fuzzer-505f-1609523439870-1] 2021-01-01T17:52:12.733+0000 d20026| 2021-01-01T17:52:12.734+00:00 I  SH_REFR  4619903 [CatalogCache-0] "Error refreshing cached collection","attr":{"namespace":"test_reshard.reshard_coll","durationMillis":51,"error":"QueryPlanKilled: Failed to load local metadata. :: caused by :: Executor error during find command :: caused by :: collection dropped. UUID aee9e556-c159-472d-88ac-be15f1f66266"}
      ...
      [js_test:resharding_fuzzer-505f-1609523439870-1] 2021-01-01T17:52:12.752+0000 c20032| 2021-01-01T17:52:12.753+00:00 I  COMMAND  4956902 [ReshardingCoordinatorService-1] "Resharding failed","attr":{"namespace":"test_reshard.reshard_coll","newShardKeyPattern":{"recipient":1,"slot":1},"error":{"code":175,"codeName":"QueryPlanKilled","errmsg":"Unable to _flushRoutingTableCacheUpdatesWithWriteConcern for namespace test_reshard.reshard_coll on shard3-recipient0 :: caused by :: Failed to load local metadata. :: caused by :: Executor error during find command :: caused by :: collection dropped. UUID aee9e556-c159-472d-88ac-be15f1f66266"}}
      ...
      [js_test:resharding_fuzzer-505f-1609523439870-1] 2021-01-01T17:52:12.835+0000 	"errmsg" : "Expected to match 1 docs, but only matched 0 for write request { update: \"collections\", bypassDocumentValidation: false, ordered: true, updates: [ { q: { _id: \"test_reshard.system.resharding.e02ba0ab-8e3d-4763-9850-af410777e804\" }, u: { $set: { reshardingFields.state: \"error\", lastmod: new Date(1609523532796) } }, multi: false, upsert: false } ] }",
      [js_test:resharding_fuzzer-505f-1609523439870-1] 2021-01-01T17:52:12.835+0000 	"code" : 5030401,
      

      https://logkeeper.mongodb.org/lobster/build/32e6a99e0e300f46dfc9d2b2cc0076c2/test/5fef612cbe07c43a2f26aae7#bookmarks=0%2C20258%2C25353%2C25366%2C25371%2C25372%2C25373%2C25384%2C25434%2C25435%2C30754&l=1

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: