Details
-
New Feature
-
Resolution: Won't Fix
-
Major - P3
-
None
-
None
-
None
Description
When queries are slow, it would be helpful to have additional tools for learning what is making them slow. In the spirit of .explain() and somewhat in the style of "trace route" for an IP packet, we should add an addSpecial-style option for queries that asks the servers (mongos and mongod) to timestamp every easily marked step in the processing of a request. From the mongo shell, this might be a .trace() option added to a query the same way we add .explain() today. Drivers might provide this feature as an additional attribute as appropriate.
The goal is to be able to explain what a slow query was doing while it was being processed. It may be expensive or impossible to learn this in complete detail, but where there are points in the processing of a request where timestamping is feasible and would not significantly slow down non-traced queries we should append timestamp data to traced requests and be able to return that data either along with the response or as an analog of getLastError; something like getTimingTrace might return the trace data for the previous request if the previous request was marked as "trace".
The following imaginary scenario may be unworkable, but here's an example what this might look like:
A request took 20 seconds, and when we repeated it, it took 22 seconds. We try it a third time and add .trace() to the end. It takes 18 seconds, and doing a db.getTimingTrace() immediately afterward we get this output:
{
|
note : "all times in ms",
|
timing : [
|
{ parse : 0 },
|
{ acquire-read-lock : 13143 },
|
{ cursor-select : 3147 },
|
{ assemble-response : 1721 }
|
]
|
}
|
Our conclusion might be that this was never going to be a fast query, but we lost 13 out of 18 seconds waiting for a lock. This tells us that asking for better indexes is only going to help a little; until we address the acquire-read-lock time, we aren't going to be able to do much to speed this up.
A different test might tell a different story:
{
|
note : "all times in ms",
|
timing : [
|
{ parse : 0 },
|
{ acquire-read-lock : 0 },
|
{ cursor-select : 431 },
|
{ assemble-response : 2 }
|
]
|
}
|
In this case we might be inclined to add .hint() to get the cursor-select time to go away.
The above example are totally made up, but the desire to find out anything that we can about "why is this query taking xx seconds?" is real. If something like this, reconstituted to be feasible and informative, could be added, it would make efforts to address performance easier and more successful.