[DOCS-16394] Investigate changes in SERVER-81133: Speedup logic to persist routing table cache Created: 21/Sep/23  Updated: 05/Feb/24  Resolved: 16/Nov/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 7.2.0-rc0, 7.1.1, 7.0.4, 6.0.13, 5.0.25, Server_Docs_[20240205]

Type: Task Priority: Minor - P4
Reporter: Backlog - Core Eng Program Management Team Assignee: Jeffrey Allen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
backported by DOCS-16483 [BACKPORT] [v7.0] Speedup logic to pe... Closed
backported by DOCS-16555 [BACKPORT] [v6.0] Speedup logic to pe... Closed
backported by DOCS-16586 [BACKPORT] [v5.0] Speedup logic to pe... Closed
Documented
documents SERVER-81133 Speedup logic to persist routing tabl... Closed
Participants:
Days since reply: 20 weeks ago

 Description   
Original Downstream Change Summary

Introduced a new parameter `persistedChunkCacheUpdateMaxBatchSize` on mongod to control the maximum batch size used for updating persisted chunk cache on shards.

The parameter is defaulted to 1000 and can be changed both at runtime and startup.

Description of Linked Ticket

Each shard maintain a persisted version of the collection routing table cache in config.cache.chunks.<nss>. The function used to update this collection is updateShardChunks that is performing sequentially a delete operation followed by an insert operation for every new chunk.

When the number of new chunks to persist is high, this logic could become extremely slow. For instance, with a routing table of 10 millions chunks, it could take up to 45 min to persist the entire routing table.

This is problematic because until all the updated chunks get persisted, they will slow down considerably all the concurrent incremental routing table refreshes. For instance, an incremental refresh for a routing table of 10 millions chunks usually takes on the order of 10 milliseconds, but if the routing table needs still to be persisted to disk this incremental refresh can take up to ~5 secs.

The goal of this ticket is to speedup the logic that persist new chunks in config.cache.chunks.<nss>. One potential solution could be to perform a batch of deletions followed by a batch of insertion.


Generated at Thu Feb 08 08:15:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.