[SERVER-43878] Hint to perform a storage-order document scan Created: 08/Oct/19  Updated: 29/Mar/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 3.6.12
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: John Lilley Assignee: Backlog - Storage Engines Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Storage Engines
Participants:
Case:
Story Points: 13

 Description   

On rotating media and low-IOPS managed storage, we find that a low-selectivity query into a table performs very poorly when data is uncached, essentially using a random read for every six or so documents (I've measured this at about 1000 documents/second on an HDD with 6ms access time).  In a rather lengthy support case with MongoDB, they explain that even COLLSCAN does not proceed in storage order, and there appears to be no way to hint the query engine into performing a truly sequential read. 

However, while I agree that IXSCANs will not be performant for your situation of returning 1/3 of a collection from an environment with low random IOPS and high serial throughput, I am not sure $natural will be faster if you are using WiredTiger. As Chris said above, given WT can place new documents randomly into any free space in a collections data file, and file blocks can be randomly placed on drives, $natural order is by no means equivalent to disk order.

 While the hint($natural:1) can steer a query from IXSCAN to COLLSCAN, this doesn't really make a difference.  At least, when we try this using the Java driver's find(...).sort("$natural",1) equivalent, it makes no difference in performance.

Thus, it would be desirable to have a hint that explicitly requests "storage order collection scan", to leverage the much higher sequential throughput on some storage media.



 Comments   
Comment by John Lilley [ 05/Oct/20 ]

I agree with Keith, I would prefer to have MongoDB just figure it out and choose the optimal access pattern.  I originally filed this issue after noticing that queries which utilized a sequential-scan plan, or which returned a lot of documents, did not seem to exploit the much faster sequential access pattern typical of managed storage and spinning media.  Oddly, an index build does use fast sequential access, so it seems possible.

However, I don't think that triggering this behavior needs to be a hint of "expects to return a lot of documents".  Wouldn't any query plan that results in a sequential collection scan potentially benefit from this optimization given the right combination of cache memory, sequential access performance, and random access performance?  Why not have MongoDB automatically tune its access pattern based on observed storage performance, something like another facet of the query optimizer, but instead a "collection scan optimizer"?

Comment by Keith Smith [ 05/Oct/20 ]

I'd suggest that an application-provided hint should describe the application's needs or expectations, rather than directing the storage system's implementation.  I.e., perhaps what is relevant here is that the application expects a query to return a large fraction of the documents in a collection and it doesn't care what order they are returned in.  This would let MongoDB and/or WiredTiger determine the best way to optimize for the what the application is going to do, possibly taking into account things that aren't visible to the application (e.g., other system activity, cache pressure, parallelism in the underlying storage system, etc.)

Comment by John Lilley [ 11/Oct/19 ]

Thanks.  One other thing I have observed: when a new index is created on a populated collection, I can watch the disk I/O profile and that process definitely uses a more efficient sequential scan.  For example, on a given table with slow SSD backing it, the index-creation scan might show 100MB/sec but the table query might show 15MB/sec.

Comment by Danny Hatcher (Inactive) [ 11/Oct/19 ]

Thank you for opening this ticket. I'll pass this along to the appropriate team to consider.

Generated at Thu Feb 08 05:04:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.