Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53539

TypeCollectionReshardingFields are incorrect following a shard version refresh

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL

    Description

      The collection version is seen by both shards d20020 and d20022 as being 2|1||5fec046614ff529dbac7fa05. However, only shard d20022 correctly sees the coordinator state as "cloning" while shard d20020 incorrectly sees the coordinator state as "preparing-to-donate". This causes the d20020 shard to skip constructing a RecipientStateMachine but leads the coordinator (config server) to believe the d20020 shard has finished refreshing. The resharding operation is then left unable to make further progress.

      This issue appears to only happen (and only very rarely happen) when the temporary resharding collection is being queried via mongos by the test client. I wonder if there's another issue along the lines of SERVER-51510 in ShardServerCatalogCacheLoader::_getLoaderMetadata() still.

      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.423+0000 d20020| 2020-12-30T04:39:02.423+00:00 I  SH_REFR  4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":7},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":2}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.423+0000 d20020| | 2020-12-30T04:39:02.423+00:00 I  SHARDING 5262000 [RecoverRefreshThread] "Ignoring shard version change","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"preparing-to-donate","recipientFields":{"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|0||5fec046614ff529dbac7fa05"}
      ...
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.429+0000 d20022| 2020-12-30T04:39:02.426+00:00 I  SH_REFR  4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":6},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":3}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.429+0000 d20022| | 2020-12-30T04:39:02.426+00:00 I  SHARDING 5262001 [RecoverRefreshThread] "Creating recipient state machine","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"cloning","recipientFields":{"fetchTimestamp":{"$timestamp":{"t":1609303142,"i":51}},"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|1||5fec046614ff529dbac7fa05"}
      ...
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.431+0000 d20020| 2020-12-30T04:39:02.431+00:00 I  SH_REFR  4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":8},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":3}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.431+0000 d20020| | 2020-12-30T04:39:02.431+00:00 I  SHARDING 5262000 [RecoverRefreshThread] "Ignoring shard version change","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"preparing-to-donate","recipientFields":{"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|0||5fec046614ff529dbac7fa05"}
      ...
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.457+0000 d20022| 2020-12-30T04:39:02.457+00:00 D1 MIGRATE  5002300 [ReshardingRecipientService-0] "Creating temporary resharding collection","attr":{"originalNss":"test.foo"}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.460+0000 d20022| 2020-12-30T04:39:02.460+00:00 I  SH_REFR  4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.foo","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046697d08cdb539562b8"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":7},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":2}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.461+0000 d20022| 2020-12-30T04:39:02.461+00:00 I  STORAGE  20320   [ReshardingRecipientService-0] "createCollection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","uuidDisposition":"provided","uuid":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"}},"options":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"}}}
      

      https://logkeeper.mongodb.org/lobster/build/66cd6439e38b9758dcdfff57bba7bf5a/test/5fec045654f248578176ab41#bookmarks=0%2C2548%2C2549%2C2555%2C2556%2C2561%2C2562%2C2567%2C2568%2C2569%2C10865&f~=000~%22createCollection%22.%2A%22test%5C.system&l=1

      (These logs are from a patch build where the "Ignoring shard version change" and "Creating recipient state machine" messages have been added.)

      Attachments

        Issue Links

          Activity

            People

              jordi.serra-torrens@mongodb.com Jordi Serra Torrens
              max.hirschhorn@mongodb.com Max Hirschhorn
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: