[SERVER-40755] Expose statistics which indicate how many collection scans have executed Created: 20/Apr/19  Updated: 29/Oct/23  Resolved: 19/Jul/19

Status: Closed
Project: Core Server
Component/s: Diagnostics, Querying
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Pawel Terlecki Assignee: Sam Mercier
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-13668 Investigate changes in SERVER-40755: ... Closed
Duplicate
is duplicated by SERVER-2058 introduce a metric that shows collect... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2019-06-03, Query 2019-06-17, Query 2019-07-01, Query 2019-07-15, Query 2019-07-29
Participants:

 Description   

It is unclear how prevalent are data scans. Having statistics on collection and index scans would allow us to decided if improvements in this area are critical for the overall performance.



 Comments   
Comment by Githook User [ 12/Jul/19 ]

Author:

{'name': 'samontea', 'username': 'samontea', 'email': 'merciers.merciers@gmail.com'}

Message: SERVER-40755 Expose statistics which indicate how many collection scans have executed
Branch: master
https://github.com/mongodb/mongo/commit/a8a8fabb17e9700aab633a67b24fe6147290bb92

Comment by Githook User [ 10/Jul/19 ]

Author:

{'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com', 'username': 'xy24'}

Message: Revert "SERVER-40755 Expose statistics which indicate how many collection scans have executed"

This reverts commit a4ef14ef41f0700ef07e5b57b0345d2396a44604.
Branch: master
https://github.com/mongodb/mongo/commit/cb3b6c8b2a28190560906db4d78ef833eec44425

Comment by Githook User [ 10/Jul/19 ]

Author:

{'name': 'samontea', 'email': 'merciers.merciers@gmail.com', 'username': 'samontea'}

Message: SERVER-40755 Expose statistics which indicate how many collection scans have executed
Branch: master
https://github.com/mongodb/mongo/commit/a4ef14ef41f0700ef07e5b57b0345d2396a44604

Comment by Asya Kamsky [ 01/Jun/19 ]

LGTM

Comment by David Storch [ 22/May/19 ]

bruce.lucas, the reasoning was two-fold:

  1. We don't want the global counter to be obscured by scans done by the replication system for oplog tailing, or by change streams. Counting collection scans on other capped collections is similarly uninteresting.
  2. Having the data at a per-collection granularity could help us understand what kind of collections applications may use COLLSCAN plans in the wild today.

That said, we could totally add a collection scan counter to serverStatus in addition to $collStats. I'd like to keep any changes around scanned and scannedObjects out of scope for this ticket, since those stats don't directly tell you whether there are collection scans happening or not. The "scanned objects" could be due to a large index scan which requires many documents to be fetched.

Comment by Bruce Lucas (Inactive) [ 21/May/19 ]

david.storch, I'm wondering why we would have scanned and scannedObjects at the serverStatus level but collectionScans at the collection level. Would it make sense to have all three (scanned, scannedObjects, and collectionScans) both per-collection and globally?

Comment by David Storch [ 21/May/19 ]

After discussing with pawel.terlecki, I propose adding a new option to $collStats called queryExecStats. This would cause a new section of statistics to be returned, also called queryExecStats. This document would contain a field called collectionScans, a per-collection 64 bit counter which is incremented whenever a collection scan plan is executed over that collection. It would look something like this:

MongoDB Enterprise > db.c.aggregate([{$collStats: {queryExecStats: {}}}]).pretty();
{
	"ns" : "test.c",
	"host" : "storchbox:27017",
	"localTime" : ISODate("2019-05-21T17:31:30.014Z"),
        "queryExecStats" : {
                "collectionScans" : NumberLong(x)
        }
}

bruce.lucas kelsey.schubert asya pawel.terlecki does this plan sound ok to you? If so please respond with an LGTM. In the meantime, I am moving this ticket back to our triage queue so it can be considered for scheduling.

Comment by Bruce Lucas (Inactive) [ 23/Apr/19 ]

We already have in serverStatus metrics.queryExecutor.scanned and .scannedObjects which can answer the general question. Possibly this information could be added to collStats and indexStats if more detailed information would be useful, although to identify the source of the scans usually the queries are needed and we typically get that from mongod logs.

Comment by Pawel Terlecki [ 21/Apr/19 ]

pasette, it is part of this ticket to figure out the best place and how much detail should be logged. If we want to collect metrics per collection/index, collStats and indexStats would be a good place, but serverStatus would still show some aggregated stats, like for other WT events.

In addition to the internal use, this could be used for troubleshooting. Collections that are often scanned probably need some indexing and should be investigated further.

Comment by Daniel Pasette (Inactive) [ 21/Apr/19 ]

Do you mean including them in serverStatus, collStats or indexStats?

Generated at Thu Feb 08 04:55:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.