In 4.0 we introduced support for a multi-version routing table (PM-1013). This later (in 4.2) became the basis for distributed transactions and snapshot reads. As part of this project, we introduced a history field to the persisted chunk type, but we never actually used it in 4.0.
Since we never used it, the backup/restore procedures ever since 4.0 have referenced that this field is safe do delete on restore. However, it is actually not safe to do so, because it breaks snapshot reads (routing and filtering at a point in time).
Furthermore, due to the optimisation done under
SERVER-53274, we can again have chunks with not history field in the persisted shard-local config.system.cache.chunks collection.
As a result of the above, we can have 4.0, 4.2, 4.4, 5.0, 5.1, 5.2 clusters which are missing the history fields for some chunks, which in turn breaks snapshot reads and distributed transactions, which will fail with an error saying Chunk has no history entries.
These clusters will not have any issue functioning, until customers start using distributed transactions and snapshot reads.
This ticket is to provide manual procedure for restoring the history fields and to implement a command, which will restore the history fields automatically.
- is documented by
DOCS-15012 SERVER-62065: Upgrade can leave chunk entries without history on the shards
- is related to
SERVER-49544 Config server should send setFCV to all shards in parallel