Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9938

Inefficient index boundary selected when using character class regex

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 2.4.4
    • Component/s: Querying
    • Query Optimization

      Hi,

      It seems that when you use a character class regex in a find operation, it results in a full index scan even when the character class is anchored.

      When the field is an array, it can result in a huge performance hit as each document is accessed multiple times for each indexed array element.

      Note how we have 8 scanned objects in the following example:

      db.foo.save({ "_id" : 1, "keywords" : [  "a" ] })
      db.foo.save({ "_id" : 2, "keywords" : [  "b" ] })
      db.foo.save({ "_id" : 3, "keywords" : [  "c" ] })
      db.foo.save({ "_id" : 4, "keywords" : [  "a",  "b" ] })
      db.foo.save({ "_id" : 5, "keywords" : [  "a",  "b",  "c" ] })
      db.foo.ensureIndex({ keywords:1 })
      
      > db.foo.find({ keywords:/^[bc]/ }).explain()
      {
      	"cursor" : "BtreeCursor keywords_1 multi",
      	"isMultiKey" : true,
      	"n" : 4,
      	"nscannedObjects" : 8,
      	"nscanned" : 8,
      	"nscannedObjectsAllPlans" : 8,
      	"nscannedAllPlans" : 8,
      	"scanAndOrder" : false,
      	"indexOnly" : false,
      	"nYields" : 0,
      	"nChunkSkips" : 0,
      	"millis" : 0,
      	"indexBounds" : {
      		"keywords" : [
      			[
      				"",
      				{
      
      				}
      			],
      			[
      				/^[bc]/,
      				/^[bc]/
      			]
      		]
      	},
      	"server" : "Jeffs-MacBook-Air.local:27017"
      }
      

      The workaround seems to be to specify each element of the character class individually:

      > db.foo.find({ keywords:{ $in:[ /^b/, /^c/ ] }}).explain()
      {
      	"cursor" : "BtreeCursor keywords_1 multi",
      	"isMultiKey" : true,
      	"n" : 4,
      	"nscannedObjects" : 5,
      	"nscanned" : 5,
      	"nscannedObjectsAllPlans" : 5,
      	"nscannedAllPlans" : 5,
      	"scanAndOrder" : false,
      	"indexOnly" : false,
      	"nYields" : 0,
      	"nChunkSkips" : 0,
      	"millis" : 0,
      	"indexBounds" : {
      		"keywords" : [
      			[
      				"b",
      				"d"
      			],
      			[
      				/^b/,
      				/^b/
      			],
      			[
      				/^c/,
      				/^c/
      			]
      		]
      	},
      	"server" : "Jeffs-MacBook-Air.local:27017"
      }
      

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            jlee Jeff lee
            Votes:
            3 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: