[SERVER-50286] Indicate if operation is reading from a view in the log file Created: 13/Aug/20  Updated: 29/Oct/23  Resolved: 12/May/21

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Chris Harris Assignee: Irina Yatsenko (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng, read-only-views
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-14452 Investigate changes in SERVER-50286: ... Closed
Related
is related to SERVER-58137 Missing [resolvedViews] field for ins... Backlog
Backwards Compatibility: Fully Compatible
Sprint: Query Execution 2021-04-19, Query Execution 2021-05-03, Query Execution 2021-05-17
Participants:

 Description   

Currently there is no indication in the log file for slow query entries that the target of the operation is a view.  This can be very confusing since, for example, the operation could report using an index for a field that is seemingly not used or have a sort when none is requested directly.  

We should indicate when the target entity is a view.  Such an indication could range from a simple boolean (eg isView:1) to something more complex like reporting the final generated pipeline (as is done when explain is used on such queries).  



 Comments   
Comment by Githook User [ 12/May/21 ]

Author:

{'name': 'Irina Yatsenko', 'email': 'irina.yatsenko@mongodb.com', 'username': 'IrinaYatsenko'}

Message: SERVER-50286 Indicate in logs whether an operation involves a view
Branch: master
https://github.com/mongodb/mongo/commit/265ba00b96161463bc2a6d8640098d3ac4e4b70a

Comment by Irina Yatsenko (Inactive) [ 10/May/21 ]

After the last round of CR feedback, the logging has been changed to include the resolved pipeline and would look similar to this:

	"resolvedViews":
	[
	  {
	    "viewNamespace":"toys.invpercustomer",
	    "dependencyChain":["invpercustomer","customers"],
	    "resolvedPipeline":
            [
              {"$lookup":{"from":"types","localField":"mainpref","foreignField":"_id","as":"inv"}},
              {"$project":{"name":1,"count":"$inv.count"}},
              {"$unwind":{"path":"$count"}}
            ]
	  },
	  {
	    "viewNamespace":"toys.types",
	    "dependencyChain":["types","catalog"],
	    "resolvedPipeline":
	    [
	      {"$group":{"_id":"$type","count":{"$sum":1}}},
	      {"$sort":{"count":-1}}
	    ]
	  }
	],

Comment by Irina Yatsenko (Inactive) [ 07/May/21 ]

After the CR, settled on the following format for the new property:

"resolvedViews":"db.x_view->s_view->coll;db.y_view->coll;" 

Note: in sharded environment, the situation is more complicated. The view resolution is done on the shard that contains the collection, and it might get logged if the operation that encompasses the resolution is slower than slowms. It will have errName set to "CommandOnShardedViewNotSupportedOnMongod". On mongos side, "resolvedViews" won't be set in the slow query log entry but it will contain a separate record in the log for "CommandOnShardedViewNotSupportedOnMongod" with the views base collection and the resolved pipeline.

Comment by Chris Harris [ 29/Apr/21 ]

The proposed syntax LGTM.  

Regarding duplicating entries, it is not clear to me if that listing them multiple times would provide any particular value.  Perhaps there is an argument to be had that it would further enhance the capabilities for related tooling that attempted to consume these log entries, but I am having a difficult time constructing an example of what that additional functionality would be.  So I am in favor of de-duplicating the entries, especially if there is the possibility that not doing so could result in large log lines in some situations along with truncation of other parts of the entry.

From a support perspective, the knowledge that the operation is on a view will probably result in us gathering explain output as the next step in almost all situations.  That additional output will provide us with a more complete view (no pun intended) of the overall aggregation that was executed as well as the collection(s) involved.  There is limited analysis and recommendations that we can make based only on the additional knowledge of what collections/views are involved from the log entry.  The information that that potentially could be captured in the log file which would be more actionable is the actual aggregation pipeline that is executed.  However that is (far) outside the scope of this ticket and not a change that we are requesting at this time.  We could potentially create another ticket to consider that in the future, but that's not specifically necessary either.  

Comment by Irina Yatsenko (Inactive) [ 29/Apr/21 ]

Realized, that we can add to the log all resolved views. The new property in the logs will look similar to the one below, which was generated for a query against invpercustomer view with a lookup on types view: 

"resolvedViews":"toys.invpercustomer->toys.customers;toys.types->toys.catalog;" 

For the cases when the same view is used multiple times (directly or inside a dependency chain), is it desirable to log it multiple times or to remove the duplication?

Comment by Kyle Suarez [ 28/Apr/21 ]

irina.yatsenko, this sounds fine to me, with a slight preference to maybe change the formatting to look something like

"resolvedViews":"toys.toysbytypens -> toys.types -> toys.catalog",

Ultimately though, I'll defer to christopher.harris as the final judge on what would be most valuable to the support team.

Comment by Irina Yatsenko (Inactive) [ 28/Apr/21 ]

christopher.harris, kyle.suarez, what do you think about adding the highlighted field to the attributes of a "slow query" log entry?

1
{
2
 "t":\{"$date":"2021-04-28T20:39:58.041+00:00"},
3
 "s":"I",
4
 "c":"COMMAND",
5
 "id":51803,
6
 "ctx":"conn1",
7
 "msg":"Slow query",
8
 "attr":{
9
  "type":"command",
10
  "ns":"toys.toysbytypens",
11
  "appName":"MongoDB Shell",
12
  "command":\{...},
13
  "planSummary":"COLLSCAN",
14
  "resolvedViews":"toys.toysbytypens,toys.types,toys.catalog,",
15
  "keysExamined":0,
16
  "docsExamined":21,
17
  "hasSortStage":true,
18
  "usedDisk":true,
19
  "cursorExhausted":true,
20
<etc.>
21
}

The above would be logged for db.toysbytypens.find({}) query against toysbytypens view that is based on types view that is based on catalog collection.

Limitations:

  • there is no correlation between hasSortStage (or indexes) and resolvedViews fields, e.g. cannot say which of the views triggered sorting
  • if a pipeline for any of the views in the dependency chain contain $lookup or other branching stages that in turn depend on views – these views won't be included into "resolvedViews" list

Would this be useful/sufficient to improve the experience? 

Comment by Kyle Suarez [ 31/Mar/21 ]

In the case of a view on a view on a view, etc., should we log the entire chain of views? Or only name the "final" resolved collection namespace? If we go with the entire chain, it sounds like the most natural place to put this is right after the view resolution code that traverses the views DAG.

Generated at Thu Feb 08 05:22:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.