-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
Sharding EMEA 2021-05-31, Sharding EMEA 2021-06-14
The goal of this ticket is to verify that the ShardServerCatalogCacheLoader properly works when the information that we get from the config server has a different epoch than the information that we have locally.
Note that every time the SSCCL asks for information to the config server, it creates a task that contains this information and it is appended to a list of pending tasks. This list of tasks is used for two things:
- In order to fulfill the refresh that comes from the catalog cache, the SSCCL has to compute the results of the refresh. This usually combines some persisted metadata with the metadata of the tasks that haven't been persisted yet.
- Using an external thread, the changes associated to each task are locally persisted on config.cache.collections and config.cache.chunks.*.
So far we know that the following two invariants are incorrect under this scenario.
I see two ways of supporting this scenario:
- When we detect an epoch mismatch, instead of creating one task we create two: the first one represents the drop of the old collection so and the new one is just the re-creation. From a CollectionVersion point of view, it will be: CV_old -> UNSHARDED -> CV_new, so we won't hit the invariants.
- Having just one task and fixing the invariant an other potential places. Several parts of the code are already prepared to deal with this creation-drop-creation scenario just using one task (I don't want to add more pointers here but I can walk you through). I wonder how much work we would need to do.
Finally, lucky for us we can unittest the SSCCL, so I would start creating a unittest for this scenario. We may need a way to start/stop the external thread that consumes tasks.