Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59341

Query planner choses collscan over covered index scan for huge collection

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.0.2
    • Component/s: None
    • None
    • Query Optimization
    • ALL
    • Hide
      db.x.insert({meta:1, control:{min:{time:1},max:{time:1}}})
      db.x.insert({meta:2, control:{min:{time:1},max:{time:1}}})
      db.x.createIndex({meta:1,"control.min.time":1,"control.max.time":1})
      db.x.find({},{"meta":1, "control.max.time":1,_id:0},).hint({meta:1,"control.min.time":1,"control.max.time":1}).explain()
      

      Without the hint, the plan is a collection scan:

      rs:PRIMARY> db.x.find({},{"meta":1, "control.max.time":1,_id:0},).explain().queryPlanner
      {
      	"namespace" : "test.x",
      	"indexFilterSet" : false,
      	"parsedQuery" : {
      		
      	},
      	"queryHash" : "CCF36A8A",
      	"planCacheKey" : "C36EB5A6",
      	"maxIndexedOrSolutionsReached" : false,
      	"maxIndexedAndSolutionsReached" : false,
      	"maxScansToExplodeReached" : false,
      	"winningPlan" : {
      		"queryPlan" : {
      			"stage" : "PROJECTION_DEFAULT",
      			"planNodeId" : 2,
      			"transformBy" : {
      				"meta" : true,
      				"control" : {
      					"max" : {
      						"time" : true
      					}
      				},
      				"_id" : false
      			},
      			"inputStage" : {
      				"stage" : "COLLSCAN",
      				"planNodeId" : 1,
      				"filter" : {
      					
      				},
      				"direction" : "forward"
      			}
      		},
      		"slotBasedPlan" : {
      			"slots" : "$$RESULT=s13 $$RID=s5 env: { s2 = Timestamp(1628870938, 1) (CLUSTER_TIME), s1 = TimeZoneDatabase(Australia/NSW...US/Arizona) (timeZoneDB), s3 = 1628870945358 (NOW) }",
      			"stages" : "[2] traverse s13 s12 s4 [s5] {} {} \nfrom \n    [1] scan s4 s5 none none none none [] @\"f4710229-f7b9-46e0-ac36-260c26f8ab26\" true false \nin \n    [2] cfilter {isObject (s4)} \n    [2] mkbson s12 s4 [meta] keep [control = s11] true false \n    [2] traverse s11 s10 s6 {} {} \n    from \n        [2] project [s6 = getField (s4, \"control\")] \n        [2] limit 1 \n        [2] coscan \n    in \n        [2] cfilter {isObject (s6)} \n        [2] mkbson s10 s6 [] keep [max = s9] true false \n        [2] traverse s9 s8 s7 {} {} \n        from \n            [2] project [s7 = getField (s6, \"max\")] \n            [2] limit 1 \n            [2] coscan \n        in \n            [2] cfilter {isObject (s7)} \n            [2] mkbson s8 s7 [time] keep [] true false \n            [2] limit 1 \n            [2] coscan \n        \n    \n"
      		}
      	},
      	"rejectedPlans" : [ ]
      }
      
      Show
      db.x.insert({meta:1, control:{min:{time:1},max:{time:1}}}) db.x.insert({meta:2, control:{min:{time:1},max:{time:1}}}) db.x.createIndex({meta:1,"control.min.time":1,"control.max.time":1}) db.x.find({},{"meta":1, "control.max.time":1,_id:0},).hint({meta:1,"control.min.time":1,"control.max.time":1}).explain() Without the hint, the plan is a collection scan: rs:PRIMARY> db.x.find({},{"meta":1, "control.max.time":1,_id:0},).explain().queryPlanner { "namespace" : "test.x", "indexFilterSet" : false, "parsedQuery" : { }, "queryHash" : "CCF36A8A", "planCacheKey" : "C36EB5A6", "maxIndexedOrSolutionsReached" : false, "maxIndexedAndSolutionsReached" : false, "maxScansToExplodeReached" : false, "winningPlan" : { "queryPlan" : { "stage" : "PROJECTION_DEFAULT", "planNodeId" : 2, "transformBy" : { "meta" : true, "control" : { "max" : { "time" : true } }, "_id" : false }, "inputStage" : { "stage" : "COLLSCAN", "planNodeId" : 1, "filter" : { }, "direction" : "forward" } }, "slotBasedPlan" : { "slots" : "$$RESULT=s13 $$RID=s5 env: { s2 = Timestamp(1628870938, 1) (CLUSTER_TIME), s1 = TimeZoneDatabase(Australia/NSW...US/Arizona) (timeZoneDB), s3 = 1628870945358 (NOW) }", "stages" : "[2] traverse s13 s12 s4 [s5] {} {} \nfrom \n [1] scan s4 s5 none none none none [] @\"f4710229-f7b9-46e0-ac36-260c26f8ab26\" true false \nin \n [2] cfilter {isObject (s4)} \n [2] mkbson s12 s4 [meta] keep [control = s11] true false \n [2] traverse s11 s10 s6 {} {} \n from \n [2] project [s6 = getField (s4, \"control\")] \n [2] limit 1 \n [2] coscan \n in \n [2] cfilter {isObject (s6)} \n [2] mkbson s10 s6 [] keep [max = s9] true false \n [2] traverse s9 s8 s7 {} {} \n from \n [2] project [s7 = getField (s6, \"max\")] \n [2] limit 1 \n [2] coscan \n in \n [2] cfilter {isObject (s7)} \n [2] mkbson s8 s7 [time] keep [] true false \n [2] limit 1 \n [2] coscan \n \n \n" } }, "rejectedPlans" : [ ] }

      On a collection of 4.3 GB, with an index of 14.8 MB, the query planner chooses a collection scan over a covered index scan.

      > db.system.buckets.point_data.find({},{"meta":1, "control.max.time":1,_id:0},).hint({meta:1,"control.min.time":1,"control.max.time":1}).itcount()}}
      

      This query takes 3.4s, but without the hint it take 40s.

      The collection here is the system.buckets.point_data collection for a time-series collection, but the issue is not specific to time-series.

      I understand that often a collection scan can be faster, but in this case the 290x size difference between the index and the collection size should have been decisive. Anyway, it doesn't look like the index scan was considered at all here.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            geert.bosch@mongodb.com Geert Bosch
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: