Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-15012

SERVER-62065: Upgrade can leave chunk entries without history on the shards

      Downstream Change Summary

      This change has introduced a new command called `repairShardedCollectionChunksHistory` to counteract the effects of the bug described in this ticket.

      More about the operation of the command is available in its help option: https://github.com/mongodb/mongo/blob/3b56acfe78e91b607eafc737ebf88d237db1460a/src/mongo/s/commands/cluster_repair_sharded_collection_chunks_history_cmd.cpp#L65

      The command is under the `splitChunk` privilege so there shouldn't be any need for changes to Atlas.

      Description of Linked Ticket

      In 4.0 we introduced support for a multi-version routing table (PM-1013). This later (in 4.2) became the basis for distributed transactions and snapshot reads. As part of this project, we introduced a history field to the persisted chunk type, but we never actually used it in 4.0.

      Since we never used it, the backup/restore procedures ever since 4.0 have referenced that this field is safe do delete on restore. However, it is actually not safe to do so, because it breaks snapshot reads (routing and filtering at a point in time).

      Furthermore, due to the optimisation done under SERVER-53274, we can again have chunks with not history field in the persisted shard-local config.system.cache.chunks collection.

      As a result of the above, we can have 4.0, 4.2, 4.4, 5.0, 5.1, 5.2 clusters which are missing the history fields for some chunks, which in turn breaks snapshot reads and distributed transactions, which will fail with an error saying Chunk has no history entries.

      These clusters will not have any issue functioning, until customers start using distributed transactions and snapshot reads.

      This ticket is to provide manual procedure for restoring the history fields and to implement a command, which will restore the history fields automatically.

            Assignee:
            Unassigned Unassigned
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              2 years, 14 weeks, 5 days ago