[SERVER-72323] Improve the first refresh of a stale router by not getting the indexes by default Created: 21/Dec/22 Updated: 21/Feb/23 Resolved: 08/Feb/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Marcos José Grillo Ramirez | Assignee: | Allison Easton |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | PM-2583-Milestone-2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||
| Sprint: | Sharding EMEA 2023-01-23, Sharding EMEA 2023-02-06, Sharding EMEA 2023-02-20 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
The current behavior when we have a stale router is to automatically refresh the data placement and the index info, however, we could leverage on the Shard Versioning Protocol to only perform the data placement info lookup, and only lookup the indexes when necessary, preventing going twice to the config server when the collection does not have indexes. The idea is the following, instead of acquiring the data, we can peek the index cache to see if the time have been advanced and simply return empty data if there is no data and the time haven't been advanced. This way, the first write in a stale router would get the data placement from the config server, and then it would go to the shard, without an index version in the request, and there, after SERVER-66864, if there is an index created in the shard, the write would throw SSV, and with the StaleConfig info we would advance the index cache, and retry the write, and this time, we will acquire the data from the config server. |
| Comments |
| Comment by Allison Easton [ 08/Feb/23 ] |
|
This has turned out to not be an easy thing to fix since there is no way to tell if the read through cache has been advanced if the cache is empty. It also isn't simple to handle the most common entry points (the CollectionRoutingInfoTargeter and the router loop) since some commands use the CollectionRoutingInfo targeter for timeseries namespace conversions and don't use the retry loops in the class. For this reason and the fact that this performance change only occurs on the first access to a new collection, we have decided to leave this as is. |