[SERVER-13419] Plan summary string in slow query log is wrong if fell back to a non-blocking solution Created: 31/Mar/14  Updated: 06/Dec/22  Resolved: 05/Aug/19

Status: Closed
Project: Core Server
Component/s: Diagnostics, Querying
Affects Version/s: 2.6.0-rc2
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: David Storch Assignee: Backlog - Query Team (Inactive)
Resolution: Won't Fix Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-17862 Unify stats reporting for log/profile... Closed
is related to SERVER-15205 Query engine should write to QLOG whe... Closed
Assigned Teams:
Query
Operating System: ALL
Participants:

 Description   

The plan summary string is produced before or soon after newRunQuery(...) begins to a run a plan. However, the plan being used may change halfway through query execution. Specifically, if a blocking plan hits its memory limit, then we fall back on a non-blocking solution (the backup solution), but the slow query line will still contain the plan summary for the blocking plan.

The following script will repro:

var t = db.stale_plan_summary;
t.drop();
 
// Slow query lines get logged at logLevel 1.
db.adminCommand({setParameter: 1, logLevel: 1});
 
var big = new Array( 1000000 ).toString()
t.save( {b:0} );
for(var i = 0; i < 40; ++i) {
    t.save( {a:0,x:big} );
}
 
t.ensureIndex({a: 1});
t.ensureIndex({b: 1});
 
// Uses a blocking plan to sort on 'a'.
t.find( {b: 0} ).sort( {a: 1} ).itcount();
 
// Insert lots of data so that the next invocation of the blocking
// plan will run out of memory.
for(var i = 0; i < 40; ++i) {
    t.save( {b:0,x:big} );
}
 
// Falls back on an index scan over 'a', but the slow query log
// still has planSummary: IXSCAN { b: 1.0 }
t.find( {b: 0} ).sort( {a: 1} ).itcount();



 Comments   
Comment by Craig Homa [ 05/Aug/19 ]

Closing because the backup plan path is very difficult to exercise in recent versions, so this bug is extremely rare. As opposed to fixing it, we should just remove the backup plan code since it is of questionable utility. david.storch will file a ticket to track that work.

Comment by David Storch [ 27/Jun/16 ]

My reading of the code is that this issue can still theoretically occur on the 3.0, 3.2, and master branches. However, it is definitely the case that the particular repro script in this ticket no longer reproduces the issue. It does not reproduce on recent branches due to CachedPlanStage re-planning, which was introduced under SERVER-15225 and first enabled by default in 3.0.7 under SERVER-20139. The code changes for CachedPlanStage re-planning involved removing the non-blocking plan fallback logic from the CachedPlanStage, but the fallback logic still exists for the MultiPlanStage.

That said, I think it may be difficult (perhaps impossible?) to exercise the MultiPlanStage fallback. Therefore, I suspect this issue to occur either rarely or not at all on versions >= 3.0.7. Let's keep this ticket open pending investigation of whether the MultiPlanStage fallback logic can ever be triggered. If it cannot be triggered, then it should be deleted. For what it's worth, our code coverage analysis indicates that the MultiPlanStage fallback code path is not being exercised by our test suites, which makes me suspect that it can be safely removed.

Generated at Thu Feb 08 03:31:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.