[SERVER-33511] Same Query Shape with different regex can give suboptimal index selection Created: 27/Feb/18  Updated: 27/Apr/22  Resolved: 01/Mar/18

Status: Closed
Project: Core Server
Component/s: Performance, Querying
Affects Version/s: 3.4.13
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Oliver Butterfield Assignee: Chris Harris
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-32452 Replanning may not occur when a plan ... Closed
Related
related to SERVER-66015 Auto-parameterization works incorrect... Closed
related to SERVER-33678 Make regex indexability a factor of q... Closed
Participants:

 Description   

Hi there

We have a collection with a large number of documents and indexes. When we run the following query:

db.getCollection("mycollection").find({"account_id": 123,"term": /^mohamed ali/ }).count()

we see that it uses this index:

	{
		"v" : 2,
		"key" : {
			"account_id" : 1,
			"term" : 1,
			"created_at" : -1
		},
		"name" : "_account_id__term__created_at",
		"ns" : "search_service.mycollection",
		"sparse" : false,
		"background" : true
	},

This is fine. It leads to the following query plan being cached:

	{
		"query" : {
			"account_id" : 123,
			"term" : /^mohamed ali/
		},
		"sort" : {
 
		},
		"projection" : {
 
		}
	}

This is fine as well.

But as soon as we change the regex for one where we aren't using the caret to symbolise "begins with", the performance is drastically reduced:

db.getCollection("mycollection").find({"account_id": 123,"term": /mohamed ali/ }).count()

The issue here is that (for whatever reason) it is not using the existing cached query plan for this shape, and ends up picking a poorer choice of index - the two queries with the different regexes still have the same shape. Moreover, once it has done this, this new (poor) choice is cached and will then be used for the /^mohamed ali/ queries leading to much worse performance than they had originally.

Essentially, the upshot is that, following a flush of the query plan cache, you can get great performance on the "begins with" version of the query, until you do a single search without the caret, and from then on, the begins-with version will also perform badly.

Is this expected behaviour?

Many thanks



 Comments   
Comment by Oliver Butterfield [ 08/Mar/18 ]

Hi Dave

Many thanks for following up on this!

Comment by David Storch [ 07/Mar/18 ]

Hi oliverbutterfield,

We've opened SERVER-33678 to track the work which would make whether or not a regex can be indexed significant in the query shape. Although the more general problem here is SERVER-32542, this particular regex-related manifestation should be fixed by SERVER-33678.

Best,
Dave

Comment by Chris Harris [ 01/Mar/18 ]

Hi Oliver,

Thank you for opening this ticket with detailed information about your observations and investigation.

Currently the pattern applied to a regex query is not a distinguishing factor when it comes to query shapes. As a result, you are observing a manifestation of SERVER-32452. That ticket describes a situation where a plan is cached in such a way that it is difficult to reach the threshold needed to trigger eviction and replanning. The increased amount of work required to identify documents that match the unanchored regex would result in the same behavior.

As such, we will proceed to close this ticket out as a duplicate of SERVER-32452. We will also open a new ticket to consider incorporating aspects of the regex pattern as distinguishing factors for query shapes and will link it to this one.

Best,
Chris

Comment by Oliver Butterfield [ 27/Feb/18 ]

Apologies, my query pasting was a bit messed up. Hopefully it should be clear that instead of

db.getCollection(my collection'')

it should read

db.getCollection('mycollection')

Generated at Thu Feb 08 04:33:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.