Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33511

Same Query Shape with different regex can give suboptimal index selection

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor - P4
    • Resolution: Duplicate
    • Affects Version/s: 3.4.13
    • Fix Version/s: None
    • Component/s: Performance, Querying
    • Labels:
      None

      Description

      Hi there

      We have a collection with a large number of documents and indexes. When we run the following query:

      db.getCollection("mycollection").find({"account_id": 123,"term": /^mohamed ali/ }).count()
      

      we see that it uses this index:

      	{
      		"v" : 2,
      		"key" : {
      			"account_id" : 1,
      			"term" : 1,
      			"created_at" : -1
      		},
      		"name" : "_account_id__term__created_at",
      		"ns" : "search_service.mycollection",
      		"sparse" : false,
      		"background" : true
      	},
      

      This is fine. It leads to the following query plan being cached:

      	{
      		"query" : {
      			"account_id" : 123,
      			"term" : /^mohamed ali/
      		},
      		"sort" : {
       
      		},
      		"projection" : {
       
      		}
      	}
      

      This is fine as well.

      But as soon as we change the regex for one where we aren't using the caret to symbolise "begins with", the performance is drastically reduced:

      db.getCollection("mycollection").find({"account_id": 123,"term": /mohamed ali/ }).count()
      

      The issue here is that (for whatever reason) it is not using the existing cached query plan for this shape, and ends up picking a poorer choice of index - the two queries with the different regexes still have the same shape. Moreover, once it has done this, this new (poor) choice is cached and will then be used for the /^mohamed ali/ queries leading to much worse performance than they had originally.

      Essentially, the upshot is that, following a flush of the query plan cache, you can get great performance on the "begins with" version of the query, until you do a single search without the caret, and from then on, the begins-with version will also perform badly.

      Is this expected behaviour?

      Many thanks

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              christopher.harris Christopher Harris
              Reporter:
              oliverbutterfield Oliver Butterfield
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: