[DOCS-15012] SERVER-62065: Upgrade can leave chunk entries without history on the shards Created: 05/Jan/22  Updated: 22/Jan/24

Status: Backlog
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 5.0.6, 5.3.0, 4.2.19, 5.2.1, 4.4.13, 4.0.29

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: backlog, feature, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
backported by DOCS-15015 [BACKPORT] [v4.2] Upgrade path from 3... Closed
backported by DOCS-15020 [BACKPORT] [v4.4] Upgrade path from 3... Closed
backported by DOCS-15058 [Server: BACKPORT] [v5.2] Upgrade pat... Closed
backports DOCS-15021 [BACKPORT] [v5.0] Upgrade path from 3... Closed
Documented
documents SERVER-62065 Upgrade path from 3.6 to 4.0 can leav... Closed
Participants:
Days since reply: 2 years, 5 weeks ago
Epic Link: DOCSP-19447

 Description   
Downstream Change Summary

This change has introduced a new command called `repairShardedCollectionChunksHistory` to counteract the effects of the bug described in this ticket.

More about the operation of the command is available in its help option: https://github.com/mongodb/mongo/blob/3b56acfe78e91b607eafc737ebf88d237db1460a/src/mongo/s/commands/cluster_repair_sharded_collection_chunks_history_cmd.cpp#L65

The command is under the `splitChunk` privilege so there shouldn't be any need for changes to Atlas.

Description of Linked Ticket

In 4.0 we introduced support for a multi-version routing table (PM-1013). This later (in 4.2) became the basis for distributed transactions and snapshot reads. As part of this project, we introduced a history field to the persisted chunk type, but we never actually used it in 4.0.

Since we never used it, the backup/restore procedures ever since 4.0 have referenced that this field is safe do delete on restore. However, it is actually not safe to do so, because it breaks snapshot reads (routing and filtering at a point in time).

Furthermore, due to the optimisation done under SERVER-53274, we can again have chunks with not history field in the persisted shard-local config.system.cache.chunks collection.

As a result of the above, we can have 4.0, 4.2, 4.4, 5.0, 5.1, 5.2 clusters which are missing the history fields for some chunks, which in turn breaks snapshot reads and distributed transactions, which will fail with an error saying Chunk has no history entries.

These clusters will not have any issue functioning, until customers start using distributed transactions and snapshot reads.

This ticket is to provide manual procedure for restoring the history fields and to implement a command, which will restore the history fields automatically.



 Comments   
Comment by Ian Fogelman [ 05/Jan/22 ]

This new command will need to be adding to master and back ported to all branches.

Comment by PM Bot [ 05/Jan/22 ]

Downstream changes updated for upstream SERVER-62065:
This change has introduced a new command called `repairShardedCollectionChunksHistory` to counteract the effects of the bug described in this ticket.

More about the operation of the command is available in its help option: https://github.com/mongodb/mongo/blob/3b56acfe78e91b607eafc737ebf88d237db1460a/src/mongo/s/commands/cluster_repair_sharded_collection_chunks_history_cmd.cpp#L65

The command is under the `splitChunk` privilege so there shouldn't be any need for changes to Atlas.

Generated at Thu Feb 08 08:11:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.