Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53539

TypeCollectionReshardingFields are incorrect following a shard version refresh

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      The collection version is seen by both shards d20020 and d20022 as being 2|1||5fec046614ff529dbac7fa05. However, only shard d20022 correctly sees the coordinator state as "cloning" while shard d20020 incorrectly sees the coordinator state as "preparing-to-donate". This causes the d20020 shard to skip constructing a RecipientStateMachine but leads the coordinator (config server) to believe the d20020 shard has finished refreshing. The resharding operation is then left unable to make further progress.

      This issue appears to only happen (and only very rarely happen) when the temporary resharding collection is being queried via mongos by the test client. I wonder if there's another issue along the lines of SERVER-51510 in ShardServerCatalogCacheLoader::_getLoaderMetadata() still.

      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.423+0000 d20020| 2020-12-30T04:39:02.423+00:00 I  SH_REFR  4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":7},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":2}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.423+0000 d20020| | 2020-12-30T04:39:02.423+00:00 I  SHARDING 5262000 [RecoverRefreshThread] "Ignoring shard version change","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"preparing-to-donate","recipientFields":{"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|0||5fec046614ff529dbac7fa05"}
      ...
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.429+0000 d20022| 2020-12-30T04:39:02.426+00:00 I  SH_REFR  4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":6},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":3}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.429+0000 d20022| | 2020-12-30T04:39:02.426+00:00 I  SHARDING 5262001 [RecoverRefreshThread] "Creating recipient state machine","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"cloning","recipientFields":{"fetchTimestamp":{"$timestamp":{"t":1609303142,"i":51}},"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|1||5fec046614ff529dbac7fa05"}
      ...
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.431+0000 d20020| 2020-12-30T04:39:02.431+00:00 I  SH_REFR  4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":8},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":3}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.431+0000 d20020| | 2020-12-30T04:39:02.431+00:00 I  SHARDING 5262000 [RecoverRefreshThread] "Ignoring shard version change","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"preparing-to-donate","recipientFields":{"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|0||5fec046614ff529dbac7fa05"}
      ...
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.457+0000 d20022| 2020-12-30T04:39:02.457+00:00 D1 MIGRATE  5002300 [ReshardingRecipientService-0] "Creating temporary resharding collection","attr":{"originalNss":"test.foo"}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.460+0000 d20022| 2020-12-30T04:39:02.460+00:00 I  SH_REFR  4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.foo","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046697d08cdb539562b8"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":7},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":2}
      [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.461+0000 d20022| 2020-12-30T04:39:02.461+00:00 I  STORAGE  20320   [ReshardingRecipientService-0] "createCollection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","uuidDisposition":"provided","uuid":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"}},"options":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"}}}
      

      https://logkeeper.mongodb.org/lobster/build/66cd6439e38b9758dcdfff57bba7bf5a/test/5fec045654f248578176ab41#bookmarks=0%2C2548%2C2549%2C2555%2C2556%2C2561%2C2562%2C2567%2C2568%2C2569%2C10865&f~=000~%22createCollection%22.%2A%22test%5C.system&l=1

      (These logs are from a patch build where the "Ignoring shard version change" and "Creating recipient state machine" messages have been added.)

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jordi.serra-torrens Jordi Serra Torrens
              Reporter:
              max.hirschhorn Max Hirschhorn
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: