[SERVER-64863] Interoperability issues between the VectorClock and the CatalogCache on secondary nodes of the CSRS Created: 24/Mar/22  Updated: 30/Jan/24

Status: Blocked
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sergi Mateo Bellido Assignee: Robert Sander
Resolution: Unresolved Votes: 0
Labels: shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-83841 Simplify the VectorClock implementations Blocked
Related
related to SERVER-85869 Exhaustive find on config shard can r... In Code Review
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24, Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04, Sharding EMEA 2023-09-18, Sharding EMEA 2023-10-02, Sharding EMEA 2023-10-16, CAR Team 2023-11-13, CAR Team 2023-11-27, CAR Team 2023-12-11
Participants:
Linked BF Score: 105
Story Points: 2

 Description   

We found this problem on the CatalogCache but I think it is a generic problem for other components that rely on some metadata stored on the config server.

Note that the problem explained below is just for secondary nodes on the CSRS, everything is fine on the primary node.

Let's assume that for whatever reason a secondary node of the CSRS receives a StaleShardVersion error from a shard, stating that the current shard version is X. That shard knew a configTime of T1, which is also part of the response, but it is not going to be gossiped in on the CSRS. As part of the StaleShardVersion error handling, we mark the routing info (i.e. the catalog cache) for that namespace as stale, expecting that the next time we need it we will perform a refresh. Note that is responsibility of the CSRS nodes to advance the configTime. Thus, it could happen that the secondary node had a configTime of T0 and when it perfoms a refresh it doesn't find the version X:  for config server nodes, the loader doesn't attach an $afterClusterTime.



 Comments   
Comment by Githook User [ 16/Jan/24 ]

Author:

{'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}

Message: Revert "SERVER-64863 Fix interoperability issues between the VectorClock and … (#16823)"

This reverts commit e494dcdcf4e913ead258e6a3c8972fa3af842493.

GitOrigin-RevId: 8ec04b2c6b1e301c68282fbcf9bd28e09ecf27d8
Branch: master
https://github.com/mongodb/mongo/commit/8f99fe2e65e9155161864634cd591ab5e0d80c1a

Comment by Githook User [ 04/Dec/23 ]

Author:

{'name': 'Robert Sander', 'email': 'robert.sander@mongodb.com', 'username': 'robsndr'}

Message: SERVER-64863 Fix interoperability issues between the VectorClock and … (#16823)

GitOrigin-RevId: e494dcdcf4e913ead258e6a3c8972fa3af842493
Branch: master
https://github.com/mongodb/mongo/commit/60c9796836cf005cf96c78c9484c05ae9e29c33a

Generated at Thu Feb 08 06:01:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.