[SERVER-72323] Improve the first refresh of a stale router by not getting the indexes by default Created: 21/Dec/22  Updated: 21/Feb/23  Resolved: 08/Feb/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Marcos José Grillo Ramirez Assignee: Allison Easton
Resolution: Won't Fix Votes: 0
Labels: PM-2583-Milestone-2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-66864 Add index version checks to versionin... Closed
is depended on by SERVER-62807 Ensure refresh performance is not aff... Closed
Assigned Teams:
Sharding EMEA
Sprint: Sharding EMEA 2023-01-23, Sharding EMEA 2023-02-06, Sharding EMEA 2023-02-20
Participants:

 Description   

The current behavior when we have a stale router is to automatically refresh the data placement and the index info, however, we could leverage on the Shard Versioning Protocol to only perform the data placement info lookup, and only lookup the indexes when necessary, preventing going twice to the config server when the collection does not have indexes.

The idea is the following, instead of acquiring the data, we can peek the index cache to see if the time have been advanced and simply return empty data if there is no data and the time haven't been advanced. This way, the first write in a stale router would get the data placement from the config server, and then it would go to the shard, without an index version in the request, and there, after SERVER-66864, if there is an index created in the shard, the write would throw SSV, and with the StaleConfig info we would advance the index cache, and retry the write, and this time, we will acquire the data from the config server.



 Comments   
Comment by Allison Easton [ 08/Feb/23 ]

This has turned out to not be an easy thing to fix since there is no way to tell if the read through cache has been advanced if the cache is empty. It also isn't simple to handle the most common entry points (the CollectionRoutingInfoTargeter and the router loop) since some commands use the CollectionRoutingInfo targeter for timeseries namespace conversions and don't use the retry loops in the class.

For this reason and the fact that this performance change only occurs on the first access to a new collection, we have decided to leave this as is.

Generated at Thu Feb 08 06:21:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.