[SERVER-80436] Reduce read-write contention on (query stats) partitions Created: 25/Aug/23  Updated: 04/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: William Qian Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 0
Labels: former-pm-2885, qi-query-stats, query-skunkworks
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image.png    
Issue Links:
Depends
Assigned Teams:
Query Integration
Participants:

 Description   

$queryStats is an aggregation query that does a read-only scan of the query stats cache, which is implemented as a collection of 16MB partitions. Even at a generous cache size of 300MB (the expected size on Atlas clusters), this results in 19 partitions.

If there are many $queryStats aggregation queries coming in to take a lock on each partition, the entire cache becomes a bottleneck for concurrent find commands, which perform short-running read-update-writes to one specific partition.

Currently, this is being mitigated by minimizing the lock time for the $queryStats aggregation queries by copying the entire partition out, effectively creating a read-only view of the partition. This still requires copying 16MB under a lock, however, and the impact can be very visible. (See attachment, which shows massive performance degradation with 32 concurrent $queryStats aggregations).

Although we do not expect many concurrent $queryStats aggregations at the moment, this can change in the future. And if it does, we will want to consider further improvements in the contention between the long-running read-only $queryStats aggregations and the short-running read-update-write find commands.

Some basic ideas to possibly consider:

  1. Finer grained locking for hash tables (to allow for more partitions) can decrease the contention between readers and writers. Since $queryStats aggregation throughput is not particularly time-sensitive, but find command throughput is, it's reasonable to trade long-running read-only performance losses for short-running read-write-update performance gains.
  2. Lock-free data structures can provide some help here. Lock-free queues can help with LRU implementation.


 Comments   
Comment by Charlie Swanson [ 04/Oct/23 ]

flagging for scheduling and removing from PM-2885 - my recommendation would be to close as "Won't Do." It is a good idea that can only help, but I don't see this as a huge problem for query stats except on multi-socket machines (which have their own problems).

There are rumors from william.qian@mongodb.com of some skunkworks ideas that could bring this back to life if they prove promising.

Generated at Thu Feb 08 06:43:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.