[SERVER-64717] Shard registry is cleared during rollbacks Created: 19/Mar/22  Updated: 29/Oct/23  Resolved: 20/Jun/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Marcos José Grillo Ramirez Assignee: Marcos José Grillo Ramirez
Resolution: Fixed Votes: 0
Labels: shardregistry-consistency-bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-50207 Investigate if ShardRegistry reads on... Closed
Related
related to SERVER-50207 Investigate if ShardRegistry reads on... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding EMEA 2022-06-13, Sharding EMEA 2022-06-27
Participants:
Linked BF Score: 118

 Description   

During rollbacks we clear the shard registry (and for some reason also in the observer) for the config server. This seems to be because under certain circumstances we might read local data.

We should ensure we're reading consistent data, which implies to always read with a majority read concern, without exceptions, so we don't have to clear the shard registry in the rollback code nor in the rollback observer. We should also ensure that all operations in add / remove shard must be done with a majority write concern.

One implication of this is that a secondary with a warmed shard registry, in the presence of a rollback while becoming a primary, it would clear the shard registry, and any service that tries to contact a shard on startup that uses the non causally consistent API will fail with a shard not found error.



 Comments   
Comment by Githook User [ 20/Jun/22 ]

Author:

{'name': 'Marcos José Grillo Ramirez', 'email': 'marcos.grillo@mongodb.com', 'username': 'm4nti5'}

Message: SERVER-64717 Ensure the ShardRegistry always read with majority write concern, even in the config server
Branch: master
https://github.com/mongodb/mongo/commit/803e6ba5c7c4a13d9978d9adb32c68cd95f7c1e0

Comment by Sergi Mateo Bellido [ 01/Jun/22 ]

I have been thinking a bit about it because of BF-25375. I think that the guarantee that we want to offer with this ticket is that the ShardRegistry will always contain information that has been majority committed in all nodes (even on the CSRS nodes). I would remove the ShardRegitry::clearEntries method as part of this ticket.

Generated at Thu Feb 08 06:00:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.