[SERVER-64863] Interoperability issues between the VectorClock and the CatalogCache on secondary nodes of the CSRS Created: 24/Mar/22 Updated: 30/Jan/24 |
|
| Status: | Blocked |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sergi Mateo Bellido | Assignee: | Robert Sander |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | shardingemea-qw | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24, Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04, Sharding EMEA 2023-09-18, Sharding EMEA 2023-10-02, Sharding EMEA 2023-10-16, CAR Team 2023-11-13, CAR Team 2023-11-27, CAR Team 2023-12-11 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 105 | ||||||||||||||||
| Story Points: | 2 | ||||||||||||||||
| Description |
|
We found this problem on the CatalogCache but I think it is a generic problem for other components that rely on some metadata stored on the config server. Note that the problem explained below is just for secondary nodes on the CSRS, everything is fine on the primary node. Let's assume that for whatever reason a secondary node of the CSRS receives a StaleShardVersion error from a shard, stating that the current shard version is X. That shard knew a configTime of T1, which is also part of the response, but it is not going to be gossiped in on the CSRS. As part of the StaleShardVersion error handling, we mark the routing info (i.e. the catalog cache) for that namespace as stale, expecting that the next time we need it we will perform a refresh. Note that is responsibility of the CSRS nodes to advance the configTime. Thus, it could happen that the secondary node had a configTime of T0 and when it perfoms a refresh it doesn't find the version X: for config server nodes, the loader doesn't attach an $afterClusterTime. |
| Comments |
| Comment by Githook User [ 16/Jan/24 ] |
|
Author: {'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}Message: Revert "SERVER-64863 Fix interoperability issues between the VectorClock and … (#16823)" This reverts commit e494dcdcf4e913ead258e6a3c8972fa3af842493. GitOrigin-RevId: 8ec04b2c6b1e301c68282fbcf9bd28e09ecf27d8 |
| Comment by Githook User [ 04/Dec/23 ] |
|
Author: {'name': 'Robert Sander', 'email': 'robert.sander@mongodb.com', 'username': 'robsndr'}Message: SERVER-64863 Fix interoperability issues between the VectorClock and … (#16823) GitOrigin-RevId: e494dcdcf4e913ead258e6a3c8972fa3af842493 |