[SERVER-84548] Using ShardServerCatalogCacheLoader on configsvr causes excessive WT data handles / memory usage Created: 04/Jan/24  Updated: 08/Feb/24

Status: In Code Review
Project: Core Server
Component/s: None
Affects Version/s: 7.0.0
Fix Version/s: 7.2.1, 7.3 Required, 7.0.6

Type: Bug Priority: Critical - P2
Reporter: Jordi Serra Torrens Assignee: Kshitij Gupta
Resolution: Unresolved Votes: 0
Labels: car-product-sync, cs-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
is caused by SERVER-72489 Decide if catalog shards need ShardSe... Closed
Related
is related to SERVER-84243 Separate Shard-Role Catalog Cache and... In Progress
Assigned Teams:
Cluster Scalability
Operating System: ALL
Backport Requested:
v7.3, v7.2, v7.0
Participants:
Case:
Story Points: 4

 Description   

PM-2290/SERVER-72489 made the configsvr start using the ShardServerCatalogCacheLoader (instead of the ConfigServerCatalogCacheLoader) to refresh its in-memory routing table cache. The ShardServerCatalogCacheLoader persists the cache on internal collections (config.cache.<nss>) – one internal collection for each actual collection.

Some processes on the configsvr, such as the balancer or the shardingIndexConsistencyCheker, periodically refresh and use the routing tables. On deployments with a huge number of collections this will caused increased resource usage, particularly WT data handles, which are only garbage-collected after 10 minutes of inactivity. This leads to increased memory usage. Given that configsvr instances are typically small sized, this may trigger OOM failures.

 

For this issue, don’t allow transitioning into embedded config server and we restore always using the ConfigServerCatalogCacheLoader in 7.0 and 7.3, but do not change 8.0.

 

This means disabling transitionFromDedicatedConfigServer/transitionToDedicatedConfigServer.



 Comments   
Comment by Githook User [ 08/Feb/24 ]

Author:

{'name': 'Kshitij Gupta', 'email': 'kshitij.gupta@mongodb.com', 'username': 'kshitijng'}

Message: SERVER-84548: Re-add transition to catalog shard feature flag and use
CSCCL.
Branch: v7.2
https://github.com/mongodb/mongo/commit/5f15dd43c6d190c85dd4ed067499b45cb893f668

Comment by Githook User [ 07/Feb/24 ]

Author:

{'name': 'kshitij', 'email': 'kshitij.gupta@mongodb.com', 'username': 'kshitijng'}

Message: SERVER-84548: Use CSCCL for dedicated config server on 7.x. (#18391)

GitOrigin-RevId: d24349ec9e99f3f597ca9441d87ef669af05e755
Branch: v7.0
https://github.com/mongodb/mongo/commit/a0f7d7359dd2d287cce6e18bd489fce518e294fb

Comment by Jordi Serra Torrens [ 04/Jan/24 ]

Possible solutions:

  • Stop using SSCCL for dedicated config servers (This might require some start-up parameter, or some more complex run-time switch between SSCCL and CSCCL. Or disabling PM-2290 feature flag on v7.0 since the feature is effectively disabled).
  • Make use of the separate shard role and router role catalog caches (SERVER-84243), so that the balancer and index consistency checker use the router-role cache, which is based on the CSCCL. We'd need to backport this to v7.0.
Generated at Thu Feb 08 06:55:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.