[SERVER-79311] Investigate if LogicalSessionCache refresher and reaper truly need to force refresh the routing info for config.system.sessions Created: 25/Jul/23  Updated: 07/Sep/23  Resolved: 05/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Cheahuychou Mao Assignee: Cheahuychou Mao
Resolution: Won't Do Votes: 0
Labels: sharding-nyc-subteam3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding NYC
Sprint: Sharding NYC 2023-08-21, Sharding NYC 2023-09-04
Participants:
Story Points: 3

 Description   

The LogicalSessionCache refresher and reaper currently have the step to check that the config.system.sessions collection exists (here and here) which under the hood performs a force refresh of the routing for the collection. On a secondary shardsvr mongod, each routing info refresh involves making the primary refresh by running a _flushRoutingTableCacheUpdate command against the primary and waiting for opTime that the command returns. From code inspection, the wait does not have a timeout. So the opTime wait time after each _flushRoutingTableCacheUpdate command is dependent on the replication lag. So when the lag is large, the refresh will take proportionally long to complete (HELP-48060) and can consequently occur less frequently than scheduled.  It is unclear why such a force refresh is necessary, i.e. why we don't just let refresher or reaper itself as a client retry the upserts/delete/find commands later if it gets a StaleConfig error. 



 Comments   
Comment by Jason Zhang [ 01/Aug/23 ]

From our discussion a listCollections directly to the primary shard could bypass waiting for replication, but some more investigation into whether or not that is feasible should be done.

Generated at Thu Feb 08 06:40:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.