[SERVER-62065] Upgrade path from 3.6 to 4.0 can leave chunk entries without history on the shards Created: 15/Dec/21  Updated: 29/Oct/23  Resolved: 05/Jan/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.27, 4.2.17, 4.4.10, 5.0.5, 5.1.1, 5.2.0-rc1
Fix Version/s: 5.3.0, 5.0.6, 4.2.19, 4.0.29, 5.2.1, 4.4.13

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-15012 SERVER-62065: Upgrade can leave chunk... Backlog
Problem/Incident
Related
is related to SERVER-49544 Config server should send setFCV to a... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.2, v5.1, v5.0, v4.4, v4.2, v4.0
Sprint: Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10
Participants:
Case:
Linked BF Score: 184

 Description   

In 4.0 we introduced support for a multi-version routing table (PM-1013). This later (in 4.2) became the basis for distributed transactions and snapshot reads. As part of this project, we introduced a history field to the persisted chunk type, but we never actually used it in 4.0.

Since we never used it, the backup/restore procedures ever since 4.0 have referenced that this field is safe do delete on restore. However, it is actually not safe to do so, because it breaks snapshot reads (routing and filtering at a point in time).

Furthermore, due to the optimisation done under SERVER-53274, we can again have chunks with not history field in the persisted shard-local config.system.cache.chunks collection.

As a result of the above, we can have 4.0, 4.2, 4.4, 5.0, 5.1, 5.2 clusters which are missing the history fields for some chunks, which in turn breaks snapshot reads and distributed transactions, which will fail with an error saying Chunk has no history entries.

These clusters will not have any issue functioning, until customers start using distributed transactions and snapshot reads.

This ticket is to provide manual procedure for restoring the history fields and to implement a command, which will restore the history fields automatically.



 Comments   
Comment by Githook User [ 23/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Introduce the 'repairShardedCollectionChunksHistory' command

Make the CPP name of '_flushRoutingTableCacheUpdates' more user-friendly:
(cherry picked from commit bcadf746d07d2eb75103ca9b7956b02a481d7a7e)

Remaining:
(cherry picked from commit 508f8dd9dd4aa27f15b327c84d5160146ffa8724)
(cherry picked from commit 949c3c821a419b3b6c3b284f5b19da2f645d39c3)
Branch: v5.2
https://github.com/mongodb/mongo/commit/e1e4356dc8b944e56ba52a21f01eb8d1a9668a3d

Comment by Sviatlana Zuiko [ 21/Jan/22 ]

Requesting a backport to 5.2

Comment by Githook User [ 19/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Fix linting error in 'configsvr_repair_sharded_collection_chunks_history_command.cpp'
Branch: v4.0
https://github.com/mongodb/mongo/commit/31d51388ae528e6ca2c25dbf5f8cca51fbd98f44

Comment by Githook User [ 12/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Introduce the 'repairShardedCollectionChunksHistory' command

(cherry picked from commit 508f8dd9dd4aa27f15b327c84d5160146ffa8724)
Branch: master
https://github.com/mongodb/mongo/commit/949c3c821a419b3b6c3b284f5b19da2f645d39c3

Comment by Githook User [ 11/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Make the CPP name of '_flushRoutingTableCacheUpdates' more user-friendly
Branch: master
https://github.com/mongodb/mongo/commit/bcadf746d07d2eb75103ca9b7956b02a481d7a7e

Comment by Githook User [ 10/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Introduce the 'repairShardedCollectionChunksHistory' command

(cherry picked from commit 146c18b9954abc116046621dc4849cc7d97ef523)
Branch: v5.0
https://github.com/mongodb/mongo/commit/508f8dd9dd4aa27f15b327c84d5160146ffa8724

Comment by Githook User [ 09/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Fix multiversion tests
Branch: v4.4
https://github.com/mongodb/mongo/commit/af02b39db7129002de996f2fea6ceee744b530e7

Comment by Githook User [ 07/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Introduce the 'repairShardedCollectionChunksHistory' command

(cherry picked from commit 3b56acfe78e91b607eafc737ebf88d237db1460a)
(cherry picked from commit bc2a34c6cb046127b6811ae8ae89abeb05a50b90)
Branch: v4.4
https://github.com/mongodb/mongo/commit/146c18b9954abc116046621dc4849cc7d97ef523

Comment by Githook User [ 06/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Introduce the 'repairShardedCollectionChunksHistory' command

(cherry picked from commit 3b56acfe78e91b607eafc737ebf88d237db1460a)
Branch: v4.2
https://github.com/mongodb/mongo/commit/bc2a34c6cb046127b6811ae8ae89abeb05a50b90

Comment by Githook User [ 05/Jan/22 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-62065 Introduce the 'repairShardedCollectionChunksHistory' command
Branch: v4.0
https://github.com/mongodb/mongo/commit/3b56acfe78e91b607eafc737ebf88d237db1460a

Generated at Thu Feb 08 05:54:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.