[SERVER-22133] COUNT_SCAN plans are not generated properly from the plan cache Created: 11/Jan/16  Updated: 25/Jun/18  Resolved: 29/Jan/16

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.0.8
Fix Version/s: 3.3.2

Type: Bug Priority: Critical - P2
Reporter: Alex Paransky Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-29931 Query plan is different for '$gte' an... Closed
is duplicated by SERVER-23675 Count with filter is considerable slo... Closed
Related
related to SERVER-22407 7% performance regression in count fo... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.2
Sprint: Query F (02/01/16)
Participants:
Case:

 Description   

We are using WiredTiger on 3.0.8 version of MongoDB. The server in question is a SECONDARY which has been setup to support our read-only/query clients.

We have observed that after a heavy activity of inserts into MASTER (which are then replicated to the SECONDARY), the queries which used to run fairly fast become unusable with response times of over a minute.

Here is an example of inconsistencies that we are seeing. The problem has already occurred and performance of queries has degraded.

The following query runs fairly fast:

db.getCollection('events-qos-loadstart').count({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY"})
Jan 11 18:25:28 ec2-52-23-180-212 mongod.27000[29049]: [conn11450] command cdc.$cmd command: count { count: "events-qos-loadstart", query: { vd.ple.vd.pid: "57de9139-cc7e-4b5c-51fd-aaa8517028f0", vd.ple.vd.acc: "EY" }, fields: {} } planSummary: COUNT_SCAN { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 } keyUpdates:0 writeConflicts:0 numYields:1817 reslen:44 locks:{ Global: { acquireCount: { r: 3636 }, acquireWaitCount: { r: 37 }, timeAcquiringMicros: { r: 134261 } }, Database: { acquireCount: { r: 1818 } }, Collection: { acquireCount: { r: 1818 } } } 218ms
Count: 232691

A similar query with just additional parameter is taking an incredibly longer time to run:

db.getCollection('events-qos-loadstart').count({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY", "vd.pec": {"$gte": 2}})
Jan 11 18:32:33 ec2-52-23-180-212 mongod.27000[29049]: [conn11450] command cdc.$cmd command: count { count: "events-qos-loadstart", query: { vd.ple.vd.pid: "57de9139-cc7e-4b5c-51fd-aaa8517028f0", vd.ple.vd.acc: "EY", vd.pec: { $gte: 2.0 } }, fields: {} } planSummary: IXSCAN { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 } keyUpdates:0 writeConflicts:0 numYields:3444 reslen:44 locks:{ Global: { acquireCount: { r: 6890 }, acquireWaitCount: { r: 2074 }, timeAcquiringMicros: { r: 29318135 } }, Database: { acquireCount: { r: 3445 } }, Collection: { acquireCount: { r: 3445 } } } 85835ms
Count: 231445

Here are the indexes which are defined on this collection:

[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "cdc.events-qos-loadstart"
	},
	{
		"v" : 1,
		"key" : {
			"vd.ple.vd.acc" : 1,
			"vd.ts" : 1,
			"vd.pec" : 1
		},
		"name" : "vd.ple.vd.acc_1_vd.ts_1_vd.pec_1",
		"ns" : "cdc.events-qos-loadstart"
	},
	{
		"v" : 1,
		"key" : {
			"vd.ts" : 1,
			"vd.pec" : 1
		},
		"name" : "vd.ts_1_vd.pec_1",
		"ns" : "cdc.events-qos-loadstart"
	},
	{
		"v" : 1,
		"key" : {
			"vd.ple.vd.acc" : 1,
			"vd.ple.vd.pid" : 1,
			"vd.pec" : 1,
			"vd.ts" : 1
		},
		"name" : "vd.ple.vd.acc_1_vd.ple.vd.pid_1_vd.pec_1_vd.ts_1",
		"ns" : "cdc.events-qos-loadstart"
	}
]

Note that all the fields are in an index.

If I restart the server, both of these queries will return in milliseconds.



 Comments   
Comment by David Storch [ 19/Jan/17 ]

Hi royrez@microsoft.com,

We have reviewed the change and decided not to backport, in order to avoid any risk of destabilizing the count path in 3.2. If this fix is critical for your application, please upgrade to version 3.4, which contains the fix.

I don't think there is an easy workaround, but if we find one we will describe it on this ticket.

Best,
Dave

Comment by Roy Reznik [ 22/Nov/16 ]

Dave,
Is there any workaround for the 3.2 branch?
I'm seeing very slow queries that by "explain" should run with the COUNT_SCAN plan, but in practice run with the IX_SCAN plan, causing them to run VERY slowly.
It also looks like a minor change, maybe worth backporting?

Thanks.

Comment by Githook User [ 29/Jan/16 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-22133 generate COUNT_SCAN plans properly from the plan cache
Branch: master
https://github.com/mongodb/mongo/commit/dac2f722f8aaec2917dce98862d58abde2605f7a

Comment by Alex Paransky [ 19/Jan/16 ]

Dave, you are absolutely correct. The SERVER-22093 is in fact what I am seeing with the simple aggregation query examples that I provided with explain plans.

I reviewed the comments in SERVER-22093 and it seems that as written that ticket addresses a sub-case of a bigger issue alluded to by @redbeard0531 in https://jira.mongodb.org/browse/SERVER-22093?focusedCommentId=1129757&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1129757.

The real solution is for aggregation/find to realize that all the fields needed for future stages/projection are in fact already in an index and avoid the FETCH stage entirely or perform a more specific FETCH_INDEX stage (which would only retrieve data already loaded with the index). This would cover the more limited case of count as well as provide an optimization for all the aggregations which I am doing where the index contains all the fields.

So, fixing COUNT is actually a limited subset of a "skip-fetch-if-all-fields-are-already-covered-by-index" issue in the plan generation pipeline. This is what I think the essence of a new ticket should be (actually instead of SERVER-22093).

Thanks.

Comment by David Storch [ 19/Jan/16 ]

Hi apara,

Great question. The .find() operation takes over 80 seconds because it cannot be covered without a user-supplied projection excluding all non-indexed fields. However, you are absolutely right that there is no reason that your count operation using the .aggregate() command cannot be covered. The aggregation system does try to be smart about analyzing which fields it needs to compute its results, projecting out all other fields. In this particular case it identifies that it does not need any fields from the user documents and generates a special "no fields needed projection". However, the underlying query machinery does not understand this special projection, resulting in a non-covered query plan. This is a known bug described by SERVER-22093. I suspect that, due to this known issue, your aggregation is fetching documents from the collection, causing it to be much slower than the covered COUNT_SCAN plan. If you would like to definitely confirm that this is the case, please post the output from explaining your aggregation pipeline.

So, to summarize, there are basically three issues at play here:

  1. COUNT_SCAN plans are generated incorrectly from the plan cache. This is the originally reported issue, and it is the one that will be addressed by a code fix under this ticket.
  2. Aggregation cannot generate covered plans for pipelines which are pure count operations. This issue is described in SERVER-22093.
  3. Count operations described using the an aggregation pipeline cannot use COUNT_SCAN. As I mentioned before, this is more of a feature request than a bug. To my knowledge, there is no ticket describing this work. I can file one on your behalf if you like.

Best,
Dave

Comment by Alex Paransky [ 16/Jan/16 ]

Thanks David. The real question is if all my data is in an index and COUNT_SCAN can match all that data in under 200ms why does it take over 80 seconds for IDX_SCAN to do the same thing? Again, we are not using any data which is not covered by the index so aggregation is running on the fields which are ALL covered by the index.

We are only talking about a data sub set of about 232,691 documents.

The 200 milliseconds vs 80 seconds is a significant performance difference.

Comment by David Storch [ 15/Jan/16 ]

Hi apara,

Good news. I think I have been able to develop a simple repro script which demonstrates what is happening:

// Set up data and indexes.
db.c.drop();
for (var i = 1; i <= 6; i++) { db.c.insert({a: 1, b: 1, c: i}); }
db.c.ensureIndex({a: 1, b: 1, c: 1, d: 1});
db.c.ensureIndex({a: 1, z: 1, c: 1});
 
// Enable logLevel 1 so that "planSummary" strings appear in the logs.
db.setLogLevel(1);
 
// Logs should show a COUNT_SCAN plan.
db.c.count({a: 1, b: 1, c: {$gt: 4}});
 
// Create a plan cache entry using the same predicate.
db.c.find({a: 1, b: 1, c: {$gt: 4}}).itcount();
 
// This time IXSCAN will show up as the planSummary, but it should be COUNT_SCAN.
db.c.count({a: 1, b: 1, c: {$gt: 4}});

The problem has to do with how we use the plan cache to construct the execution plans for count queries. The fast count operations which you observe are using a special COUNT_SCAN plan, which is an optimized fast-path for answering count operations. The fast path is covered, which means that it can execute by reading indexes only, without ever fetching data from the collection. The fix will allow count operations coming from the plan cache to properly make use of a COUNT_SCAN plan. However, note that aggregations cannot use COUNT_SCAN, and that a fix for this ticket will not allow use of COUNT_SCAN for aggregation operations. While this would be a good way to optimize aggregation pipelines like yours, the work to make that happen is complex (closer to a new feature than a bug fix).

I think we have enough information from you, so I am going to move this ticket out of the "debugging with submitter" state. Please continue to watch for updates as work on this progresses.

Best,
Dave

Comment by David Storch [ 15/Jan/16 ]

Hi apara,

Apologies for the delayed response; things have been busy here. I'm planning on looking into this more today. It sounds like from the information you've already provided we have some leads, but if I need any more information from you I'll definitely ping you on this ticket. Thanks for your willingness to help us track this down!

Best,
Dave

Comment by Alex Paransky [ 13/Jan/16 ]

David, let me know how I can help with debugging of this issue. We are currently setting up another SECONDARY replica so that we may be able to stop/restart the instance in question without effecting our users. We can setup a time or I can do some more tests.

Comment by Alex Paransky [ 12/Jan/16 ]

Here is another example:

Jan 12 16:20:33 ec2-52-23-180-212 mongod.27000[29049]: [conn14780] command cdc.$cmd command: aggregate { aggregate: "events-qos-loadstart", pipeline: [ { $match: { vd.ple.vd.acc: "EY", vd.ple.vd.pid: "313c0296-5469-59f7-7cbe-5b818a2e657c", vd.pec: { $gte: 2.0 } } }, { $group: { _id: {}, count: { $sum: 1.0 } } } ] } keyUpdates:0 writeConflicts:0 numYields:5174 reslen:83 locks:{ Global: { acquireCount: { r: 10354 }, acquireWaitCount: { r: 1563 }, timeAcquiringMicros: { r: 11693027 } }, Database: { acquireCount: { r: 5177 } }, Collection: { acquireCount: { r: 5177 } } } 98170ms
 
Jan 12 16:22:59 ec2-52-23-180-212 mongod.27000[29049]: [conn14780] command cdc.$cmd command: count { count: "events-qos-loadstart", query: { vd.ple.vd.pid: "313c0296-5469-59f7-7cbe-5b818a2e657c", vd.ple.vd.acc: "EY", vd.pec: { $gte: 2.0 } }, fields: {} } planSummary: IXSCAN { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 } keyUpdates:0 writeConflicts:0 numYields:4493 reslen:44 locks:{ Global: { acquireCount: { r: 8988 }, acquireWaitCount: { r: 1195 }, timeAcquiringMicros: { r: 10605847 } }, Database: { acquireCount: { r: 4494 } }, Collection: { acquireCount: { r: 4494 } } } 81287ms
 
CLEAR PLAN
 
Jan 12 16:23:24 ec2-52-23-180-212 mongod.27000[29049]: [conn14780] command cdc.$cmd command: count { count: "events-qos-loadstart", query: { vd.ple.vd.pid: "313c0296-5469-59f7-7cbe-5b818a2e657c", vd.ple.vd.acc: "EY", vd.pec: { $gte: 2.0 } }, fields: {} } planSummary: COUNT_SCAN { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 } keyUpdates:0 writeConflicts:0 numYields:3058 reslen:44 locks:{ Global: { acquireCount: { r: 6118 } }, Database: { acquireCount: { r: 3059 } }, Collection: { acquireCount: { r: 3059 } } } 136ms

Comment by Alex Paransky [ 12/Jan/16 ]

At this point I can re-create the slow find issue on demand simply by running an aggregation. After the aggregation runs, the FIND command goes back to running slow.

Comment by Alex Paransky [ 12/Jan/16 ]

Here is the REAL problem that I am trying to solve.

When I execute this command:

db.runCommand(
{
    "aggregate": "events-qos-loadstart",
    "pipeline": [{
        "$match": {
                    "vd.ple.vd.acc": "EY",
                    "vd.ple.vd.pid": "313c0296-5469-59f7-7cbe-5b818a2e657c",
                    "vd.pec": {"$gte": 2}
        }
    }, 
	{
        "$group": {
            "_id": {
                "acc" : "$vd.ple.vd.acc",
            },
            "rR8Rr": {
                "$sum": 1
            }
        }
    }]
}
);

It takes quite a while to run (over a minute!) :

Jan 12 15:58:35 ec2-52-23-180-212 mongod.27000[29049]: [conn14780] command cdc.$cmd command: aggregate { aggregate: "events-qos-loadstart", pipeline: [ { $match: { vd.ple.vd.acc: "EY", vd.ple.vd.pid: "313c0296-5469-59f7-7cbe-5b818a2e657c", vd.pec: { $gte: 2.0 } } }, { $group: { _id: { acc: "$vd.ple.vd.acc" }, rR8Rr: { $sum: 1.0 } } } ] } keyUpdates:0 writeConflicts:0 numYields:4490 reslen:95 locks:{ Global: { acquireCount: { r: 9098 }, acquireWaitCount: { r: 1043 }, timeAcquiringMicros: { r: 7052888 } }, Database: { acquireCount: { r: 4549 } }, Collection: { acquireCount: { r: 4549 } } } 78725ms

If I execute the find command, right after this, it takes about the same time to run:

db.getCollection('events-qos-loadstart').find({"vd.ple.vd.pid": "313c0296-5469-59f7-7cbe-5b818a2e657c","vd.ple.vd.acc": "EY", "vd.pec": {"$gte": 2}}).size()
 
Jan 12 16:01:57 ec2-52-23-180-212 mongod.27000[29049]: [conn14780] command cdc.$cmd command: count { count: "events-qos-loadstart", query: { vd.ple.vd.pid: "313c0296-5469-59f7-7cbe-5b818a2e657c", vd.ple.vd.acc: "EY", vd.pec: { $gte: 2.0 } }, fields: {} } planSummary: IXSCAN { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 } keyUpdates:0 writeConflicts:0 numYields:4544 reslen:44 locks:{ Global: { acquireCount: { r: 9090 }, acquireWaitCount: { r: 1304 }, timeAcquiringMicros: { r: 10582063 } }, Database: { acquireCount: { r: 4545 } }, Collection: { acquireCount: { r: 4545 } } } 80866ms

So, you could say that it's taking a long time to get through the index. However, if I clear the plans the find runs fast:

Jan 12 16:02:49 ec2-52-23-180-212 mongod.27000[29049]: [conn14780] command cdc.$cmd command: count { count: "events-qos-loadstart", query: { vd.ple.vd.pid: "313c0296-5469-59f7-7cbe-5b818a2e657c", vd.ple.vd.acc: "EY", vd.pec: { $gte: 2.0 } }, fields: {} } planSummary: COUNT_SCAN { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 } keyUpdates:0 writeConflicts:0 numYields:3058 reslen:44 locks:{ Global: { acquireCount: { r: 6118 }, acquireWaitCount: { r: 42 }, timeAcquiringMicros: { r: 92598 } }, Database: { acquireCount: { r: 3059 } }, Collection: { acquireCount: { r: 3059 } } } 228ms

The aggregate command, however, still runs very slow. After the aggregate command executes, the find goes back to running slow until the plans are cleared.

It may sounds like we have a lot of data and the find simply takes a long time to scan the index, but it does the "same" scan when it does the count after the clearing of the plan and executes in under half a second.

Also, it seems that aggregation creates the plans and once the plans are created things run slow.

Why?

Comment by Alex Paransky [ 12/Jan/16 ]

It seems that after the plans are created, the query that used to run fast now runs slow...

Comment by Alex Paransky [ 12/Jan/16 ]

I re-executed the following code after clearing out the plans:

db.getCollection('events-qos-loadstart').getPlanCache().getPlansByQuery({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY"},{},{})
db.getCollection('events-qos-loadstart').getPlanCache().getPlansByQuery({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY", "vd.pec": {"$gte": 2}},{},{})

Neither produced any results.

Comment by Alex Paransky [ 12/Jan/16 ]

So, I ran this query and it was FAST:

db.getCollection('events-qos-loadstart').count({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY"})

Then, I ran this query and it was SLOW:

db.getCollection('events-qos-loadstart').count({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY", "vd.pec": {"$gte": 2}})

I looked at the server log to see the slow query logger return:

Jan 12 14:50:26 ec2-52-23-180-212 mongod.27000[29049]: [conn14780] command cdc.$cmd command: count { count: "events-qos-loadstart", query: { vd.ple.vd.pid: "57de9139-cc7e-4b5c-51fd-aaa8517028f0", vd.ple.vd.acc: "EY", vd.pec: { $gte: 2.0 } }, fields: {} } planSummary: IXSCAN { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 } keyUpdates:0 writeConflicts:0 numYields:2737 reslen:44 locks:{ Global: { acquireCount: { r: 5476 }, acquireWaitCount: { r: 651 }, timeAcquiringMicros: { r: 6638046 } }, Database: { acquireCount: { r: 2738 } }, Collection: { acquireCount: { r: 2738 } } } 45219ms

As you have suggested, I executed the following for the first query that ran fast:

db.getCollection('events-qos-loadstart').getPlanCache().getPlansByQuery({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY"},{},{})

This did not produce anything:

[ ]

Then I executed the same for the query that ran slow:

db.getCollection('events-qos-loadstart').getPlanCache().getPlansByQuery({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY", "vd.pec": {"$gte": 2}},{},{})

This resulted in the following output:

[
	{
		"details" : {
			"solution" : "(index-tagged expression tree: tree=Node\n---Leaf { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 }, pos: 0\n---Leaf { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 }, pos: 1\n---Leaf { vd.ple.vd.acc: 1, vd.ple.vd.pid: 1, vd.pec: 1, vd.ts: 1 }, pos: 2\n)"
		},
		"reason" : {
			"score" : 2.0003,
			"stats" : {
				"stage" : "FETCH",
				"nReturned" : 101,
				"executionTimeMillisEstimate" : 0,
				"works" : 101,
				"advanced" : 101,
				"needTime" : 0,
				"needFetch" : 0,
				"saveState" : 1,
				"restoreState" : 1,
				"isEOF" : 0,
				"invalidates" : 0,
				"docsExamined" : 101,
				"alreadyHasObj" : 0,
				"inputStage" : {
					"stage" : "IXSCAN",
					"nReturned" : 101,
					"executionTimeMillisEstimate" : 0,
					"works" : 101,
					"advanced" : 101,
					"needTime" : 0,
					"needFetch" : 0,
					"saveState" : 1,
					"restoreState" : 1,
					"isEOF" : 0,
					"invalidates" : 0,
					"keyPattern" : {
						"vd.ple.vd.acc" : 1,
						"vd.ple.vd.pid" : 1,
						"vd.pec" : 1,
						"vd.ts" : 1
					},
					"indexName" : "vd.ple.vd.acc_1_vd.ple.vd.pid_1_vd.pec_1_vd.ts_1",
					"isMultiKey" : false,
					"direction" : "forward",
					"indexBounds" : {
						"vd.ple.vd.acc" : [
							"[\"EY\", \"EY\"]"
						],
						"vd.ple.vd.pid" : [
							"[\"57de9139-cc7e-4b5c-51fd-aaa8517028f0\", \"57de9139-cc7e-4b5c-51fd-aaa8517028f0\"]"
						],
						"vd.pec" : [
							"[2, inf.0]"
						],
						"vd.ts" : [
							"[MinKey, MaxKey]"
						]
					},
					"keysExamined" : 101,
					"dupsTested" : 0,
					"dupsDropped" : 0,
					"seenInvalidated" : 0,
					"matchTested" : 0
				}
			}
		},
		"feedback" : {
			"nfeedback" : 20,
			"averageScore" : NaN,
			"stdDevScore" : NaN,
			"scores" : [
				{
					"score" : NaN
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : NaN
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : NaN
				},
				{
					"score" : NaN
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : NaN
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : NaN
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : NaN
				},
				{
					"score" : NaN
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : NaN
				},
				{
					"score" : 1.9996768236380424
				},
				{
					"score" : NaN
				},
				{
					"score" : NaN
				}
			]
		},
		"filterSet" : false
	},
	{
		"details" : {
			"solution" : "(index-tagged expression tree: tree=Node\n---Leaf { vd.ple.vd.acc: 1, vd.ts: 1, vd.pec: 1 }, pos: 0\n---Leaf \n---Leaf { vd.ple.vd.acc: 1, vd.ts: 1, vd.pec: 1 }, pos: 2\n)"
		},
		"reason" : {
			"score" : 1.0003,
			"stats" : {
				"stage" : "FETCH",
				"filter" : {
					"vd.ple.vd.pid" : {
						"$eq" : "57de9139-cc7e-4b5c-51fd-aaa8517028f0"
					}
				},
				"nReturned" : 0,
				"executionTimeMillisEstimate" : 0,
				"works" : 101,
				"advanced" : 0,
				"needTime" : 101,
				"needFetch" : 0,
				"saveState" : 1,
				"restoreState" : 1,
				"isEOF" : 0,
				"invalidates" : 0,
				"docsExamined" : 91,
				"alreadyHasObj" : 0,
				"inputStage" : {
					"stage" : "IXSCAN",
					"nReturned" : 91,
					"executionTimeMillisEstimate" : 0,
					"works" : 101,
					"advanced" : 91,
					"needTime" : 10,
					"needFetch" : 0,
					"saveState" : 1,
					"restoreState" : 1,
					"isEOF" : 0,
					"invalidates" : 0,
					"keyPattern" : {
						"vd.ple.vd.acc" : 1,
						"vd.ts" : 1,
						"vd.pec" : 1
					},
					"indexName" : "vd.ple.vd.acc_1_vd.ts_1_vd.pec_1",
					"isMultiKey" : false,
					"direction" : "forward",
					"indexBounds" : {
						"vd.ple.vd.acc" : [
							"[\"EY\", \"EY\"]"
						],
						"vd.ts" : [
							"[MinKey, MaxKey]"
						],
						"vd.pec" : [
							"[2, inf.0]"
						]
					},
					"keysExamined" : 101,
					"dupsTested" : 0,
					"dupsDropped" : 0,
					"seenInvalidated" : 0,
					"matchTested" : 0
				}
			}
		},
		"feedback" : {
			
		},
		"filterSet" : false
	}
]

Then as you have suggested, I cleared the plans:

db.getCollection('events-qos-loadstart').getPlanCache().clear();

Suddenly, both of the queries are executing FAST.

So, clearing out the plans seemed to have worked.

Comment by Alex Paransky [ 12/Jan/16 ]

Thomas, note that the two queries are different by:

"vd.pec": {"$gte": 2}

Comment by Kelsey Schubert [ 11/Jan/16 ]

Hi apara,

Thank you for the report. We would like to determine if this issue is related to plan caching.

When you are experiencing these slow queries can you please take the following steps. I have provided example commands that match the queries you have posted previously.

  1. Reconfirm that the query explain selects COUNT_SCAN and executes quickly:

    db.getCollection('events-qos-loadstart').explain().count({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY"})

  2. Execute PlanCache.getPlansByQuery with the same query with an empty projection and sort. Please post this output in the ticket:

    db.getCollection('events-qos-loadstart').getPlanCache().getPlansByQuery({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY"}, {}, {})

If the plan cache outputs an IXSCAN rather than a COUNT_SCAN, execute PlanCache.clear() and confirm if the query speed is improved:

db.getCollection('events-qos-loadstart').getPlanCache().clear()

db.getCollection('events-qos-loadstart').count({"vd.ple.vd.pid": "57de9139-cc7e-4b5c-51fd-aaa8517028f0","vd.ple.vd.acc": "EY"})

Regards,
Thomas

Comment by Alex Paransky [ 11/Jan/16 ]

Here is the CloudManager group that this server is in:
https://cloud.mongodb.com/v2/5582d749e4b0b632f89d47c9#deployment/topology

Comment by Alex Paransky [ 11/Jan/16 ]

More details can be found here: https://groups.google.com/forum/#!topic/mongodb-user/ojgOflygt5g

Generated at Thu Feb 08 03:59:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.