[SERVER-39295] Use readOnce: true for $sample cursors Created: 31/Jan/19  Updated: 06/Dec/22

Status: Blocked
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.1.7
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Luke Prochazka Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 1
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-47118 Investigate the behavior of read_once... Backlog
depends on SERVER-47055 Investigate performance regression in... Closed
depends on SERVER-36068 Expose a user-accessible cursor optio... Closed
Related
Assigned Teams:
Query Execution
Sprint: Query 2020-03-23
Participants:

 Description   

This is a feature request to enhance the behaviour of the $sample aggregation command by adding to the plan optimizer the WiredTiger “readOnce: true” option for MongoDB cursors (SERVER-36068).

The intended purpose behind this enhancement is so $sample does not (or is less likely to) cache the result set.  A sample by definition is unlikely to be used again by subsequent samples, thereby caching has no benefit and only serves to add unwanted cache pressure and workload contention. 



 Comments   
Comment by Ruoxin Xu [ 25/Mar/20 ]

Back to open. Requires performance investigation for readOnce cursors (see SERVER-47055 / SERVER-47118).

Comment by Eric Milkie [ 24/Mar/20 ]

Just for clarification, the readOnce flag is not something that will avoid in-memory cache for the pages read. Instead, it adjusts the score on such pages so that they are evicted from cache sooner than the typical LRU algorithm would. Because $sample and random cursors potentially read multiple pages per cursor advance, it's not clear the benefits from the hastening of eviction of such pages would be a good enough tradeoff for the performance hit such cursor reads would experience.

Comment by Eric Milkie [ 24/Mar/20 ]

Unfortunately, after some discussion with Execution and Storage Engines team members, I believe we'll have to abandon this change to use readOnce. The feature is too unstable for us to be comfortable that it will improve performance, and in some cases we have seen that it greatly hurts performance for the scanning cursor itself with no corresponding performance increase for other readers and writers. The WiredTiger storage engine cache management system is very complex, so it is hard to anticipate what the repercussions of using readOnce cursors will be for all the workloads we are interested in.

Generated at Thu Feb 08 04:51:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.