[SERVER-57667] Improve processing speed for resharding's collection cloning pipeline Created: 12/Jun/21  Updated: 29/Oct/23  Resolved: 02/Aug/21

Status: Closed
Project: Core Server
Component/s: Query Language
Affects Version/s: None
Fix Version/s: 5.0.3, 5.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Joshua Lapacik (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-53351 Add resharding fuzzer task with step-... Closed
Problem/Incident
Related
related to SERVER-67529 Resharding silently skips documents w... Closed
related to SERVER-58983 Stub out unit tests enough to re-enab... Closed
related to SERVER-57668 Cache chunk bounds as an array in res... Closed
is related to SERVER-57483 Results from $lookup stage are not ca... Closed
Backwards Compatibility: Minor Change
Backport Requested:
v5.0
Sprint: Query Optimization 2021-07-12, Query Optimization 2021-07-26, Query Optimization 2021-08-09
Participants:
Linked BF Score: 170

 Description   

SERVER-57483 partially addressed the regression of $lookup results not being cached by special casing the config.cache.chunks collection. There is still room for significant improvement. It has been found that using ChunkManager to binary search the temporary resharding collection's chunk ranges rather than using DocumentSourceSequentialDocumentCache to scan linearly through the same chunk ranges results in a >10x speedup in the overall collection cloning runtime.

This ticket represents the work to add a custom aggregation stage or expression or otherwise optimize resharding's collection cloning pipeline to achieve the observed >10x speedup.



 Comments   
Comment by Githook User [ 02/Aug/21 ]

Author:

{'name': '80741223+jlap199@users.noreply.github.com', 'email': '80741223+jlap199@users.noreply.github.com', 'username': 'jlap199'}

Message: SERVER-57667 Speed up resharding's collection cloning using internal stage

Uses the ChunkManager to peform a binary search over the config.cache.chunks
collection instead of performing a lookup resulting in a significant speed
up for collection cloning.
Branch: master
https://github.com/mongodb/mongo/commit/eae9041c1c1320a15a6b13f3f1d4770a2b96e085

Comment by Githook User [ 02/Aug/21 ]

Author:

{'name': '80741223+jlap199@users.noreply.github.com', 'email': '80741223+jlap199@users.noreply.github.com', 'username': 'jlap199'}

Message: SERVER-57667 Speed up resharding's collection cloning using internal stage

Uses the ChunkManager to peform a binary search over the config.cache.chunks
collection instead of performing a lookup resulting in a significant speed
up for collection cloning.
Branch: v5.0
https://github.com/mongodb/mongo/commit/535947a6794c98d5f564bae04b37761188c53543

Generated at Thu Feb 08 05:42:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.