[SERVER-58361] Reduce access to SSCCL's persisted collection metadata Created: 08/Jul/21  Updated: 17/Sep/21  Resolved: 17/Sep/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Antonio Fuschetto Assignee: Kaloian Manassiev
Resolution: Won't Do Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File SSCCL's activities on system.cache.collections.png    
Sprint: Sharding EMEA 2021-07-26, Sharding EMEA 2021-08-09, Sharding EMEA 2021-08-23
Participants:

 Description   

The dropChunksAndDeleteCollectionsEntry function of the the Shard Server Catalog Cache Loader (SSCCL) module requires some information from the collection to drop the related chunks (i.e., UUID and supporting long name status), so it calls the readShardCollectionsEntry function to read the collection entry from disk.

The idea is to pass this information to the dropChunksAndDeleteCollectionsEntry function as arguments to avoid further and expensive access to the disk.

This comes from SERVER-34632's code review.



 Comments   
Comment by Antonio Fuschetto [ 30/Aug/21 ]

The following diagram shows all read and write operations on the persisted collection metadata (i.e., system.cache.collections) performed by the SSCCL's getChunksSince function:

This analysis highlights (again) that the number read operations could be optimized just considering the corner cases. This also highlights a similar situation for the write operations where, probably, might make sense to investigare a bit more on the logic around the refreshing flag (it appears that the collection is marked as refreshing until the write operations on the chunks are completed as well).

kaloian.manassiev, please let me know if you see any areas for improvement that I am ignoring or if this task can be rejected.

Comment by Antonio Fuschetto [ 27/Aug/21 ]

The dropChunksAndDeleteCollectionsEntry function, which reads the metadata from disk, is invoked in three contexts:

  1. By the getPersistedMaxChunkVersionAndLastestSupportingLongName function when the cached collection in refreshing status (corner case). In this scenario, the persisted metadata has been already read and could be potentially reused. This happens in the parent thread.
  2. By the _updatePersistedCollAndChunksMetadata and persistCollectionAndChangedChunks functions, which are called in mutual exclusion in a different thread (a worker thread of a pool). In this scenario it's crucial to read the metadata from disk instead from task because the information could be stale (e.g., supportLongName status). In fact, the needed information cannot be read from the task as it may have changed since the task was added to the queue.

In conclusion, since the dropChunksAndDeleteCollectionsEntry function is not used in the same thread and multiple times, after a deeper analysis I don't see a great value in refactoring the existing implementation. The risk is to further complicate an already complicated module, where the actual improvement applies to point 1 above, which represents a case case.

Generated at Thu Feb 08 05:44:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.