[SERVER-81133] Speedup logic to persist routing table cache Created: 18/Sep/23  Updated: 30/Jan/24  Resolved: 21/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.0.1, 4.4.24, 6.0.10, 5.0.21, 7.1.0-rc2
Fix Version/s: 7.1.1, 7.2.0-rc0, 7.0.4, 6.0.13, 5.0.25

Type: Improvement Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-16394 Investigate changes in SERVER-81133: ... Closed
Duplicate
is duplicated by SERVER-81016 Improve performance of persisted rout... Closed
Problem/Incident
Related
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.1, v7.0, v6.0, v5.0, v4.4
Sprint: Sharding EMEA 2023-10-02
Participants:
Case:
Linked BF Score: 35

 Description   

Each shard maintain a persisted version of the collection routing table cache in config.cache.chunks.<nss>. The function used to update this collection is updateShardChunks that is performing sequentially a delete operation followed by an insert operation for every new chunk.

When the number of new chunks to persist is high, this logic could become extremely slow. For instance, with a routing table of 10 millions chunks, it could take up to 45 min to persist the entire routing table.

This is problematic because until all the updated chunks get persisted, they will slow down considerably all the concurrent incremental routing table refreshes. For instance, an incremental refresh for a routing table of 10 millions chunks usually takes on the order of 10 milliseconds, but if the routing table needs still to be persisted to disk this incremental refresh can take up to ~5 secs.

The goal of this ticket is to speedup the logic that persist new chunks in config.cache.chunks.<nss>. One potential solution could be to perform a batch of deletions followed by a batch of insertion.



 Comments   
Comment by Githook User [ 14/Jan/24 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81133 Speedup logic to persist routing table cache

(cherry picked from commit 5a8d10b236e54bd03cae15d453cdc90c466d6168)
(cherry picked from commit 7db3a426f1af829be41bd3d8033049f2ca8deea3)
(cherry picked from commit ae850b2403e0e7b41062d626d3d0158c96f81cfe)

GitOrigin-RevId: 8266e4685572c669de36c2139af8949ff7bb9c90
Branch: v5.0
https://github.com/mongodb/mongo/commit/7060b169e891499168909f54f246e9a8e9a59c48

Comment by Githook User [ 19/Dec/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81133 Speedup logic to persist routing table cache

(cherry picked from commit 5a8d10b236e54bd03cae15d453cdc90c466d6168)
(cherry picked from commit 7db3a426f1af829be41bd3d8033049f2ca8deea3)

GitOrigin-RevId: ae850b2403e0e7b41062d626d3d0158c96f81cfe
Branch: v6.0
https://github.com/mongodb/mongo/commit/8bf71fc4d088b59d45f927f2936d8c60ce134cb0

Comment by Githook User [ 07/Nov/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81133 Speedup logic to persist routing table cache

(cherry picked from commit 5a8d10b236e54bd03cae15d453cdc90c466d6168)
Branch: v7.0
https://github.com/mongodb/mongo/commit/9f835bf70eb528583a5f9542144cb4f083e8ed97

Comment by Githook User [ 06/Nov/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81133 Speedup logic to persist routing table cache

(cherry picked from commit 5a8d10b236e54bd03cae15d453cdc90c466d6168)
Branch: v7.1
https://github.com/mongodb/mongo/commit/e685ee8f9abe164ec0baf5fad1cc3c09103d6e15

Comment by Githook User [ 20/Sep/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81133 Speedup logic to persist routing table cache
Branch: master
https://github.com/mongodb/mongo/commit/5a8d10b236e54bd03cae15d453cdc90c466d6168

Comment by Tommaso Tocci [ 20/Sep/23 ]

I've performed some testing on the X64 workstation and these are the results:

  • Without batching:
    • 1M chunks: ~195 seconds
    • 2 chunks: between 1 and 2 milliseconds
  • With batching (1000):
    • 1M chunks: ~48 seconds
    • 2 chunks: less than 1 millisecond
Generated at Thu Feb 08 06:45:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.