[SERVER-64717] Shard registry is cleared during rollbacks Created: 19/Mar/22 Updated: 29/Oct/23 Resolved: 20/Jun/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marcos José Grillo Ramirez | Assignee: | Marcos José Grillo Ramirez |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | shardregistry-consistency-bug | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Sharding EMEA 2022-06-13, Sharding EMEA 2022-06-27 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 118 | ||||||||||||||||||||
| Description |
|
During rollbacks we clear the shard registry (and for some reason also in the observer) for the config server. This seems to be because under certain circumstances we might read local data. We should ensure we're reading consistent data, which implies to always read with a majority read concern, without exceptions, so we don't have to clear the shard registry in the rollback code nor in the rollback observer. We should also ensure that all operations in add / remove shard must be done with a majority write concern. One implication of this is that a secondary with a warmed shard registry, in the presence of a rollback while becoming a primary, it would clear the shard registry, and any service that tries to contact a shard on startup that uses the non causally consistent API will fail with a shard not found error. |
| Comments |
| Comment by Githook User [ 20/Jun/22 ] |
|
Author: {'name': 'Marcos José Grillo Ramirez', 'email': 'marcos.grillo@mongodb.com', 'username': 'm4nti5'}Message: |
| Comment by Sergi Mateo Bellido [ 01/Jun/22 ] |
|
I have been thinking a bit about it because of BF-25375. I think that the guarantee that we want to offer with this ticket is that the ShardRegistry will always contain information that has been majority committed in all nodes (even on the CSRS nodes). I would remove the ShardRegitry::clearEntries method as part of this ticket. |