Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59341

Query planner choses collscan over covered index scan for huge collection

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 5.0.2
    • Fix Version/s: Backlog
    • Component/s: None
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      db.x.insert({meta:1, control:{min:{time:1},max:{time:1}}})
      db.x.insert({meta:2, control:{min:{time:1},max:{time:1}}})
      db.x.createIndex({meta:1,"control.min.time":1,"control.max.time":1})
      db.x.find({},{"meta":1, "control.max.time":1,_id:0},).hint({meta:1,"control.min.time":1,"control.max.time":1}).explain()
      

      Without the hint, the plan is a collection scan:

      rs:PRIMARY> db.x.find({},{"meta":1, "control.max.time":1,_id:0},).explain().queryPlanner
      {
      	"namespace" : "test.x",
      	"indexFilterSet" : false,
      	"parsedQuery" : {
      		
      	},
      	"queryHash" : "CCF36A8A",
      	"planCacheKey" : "C36EB5A6",
      	"maxIndexedOrSolutionsReached" : false,
      	"maxIndexedAndSolutionsReached" : false,
      	"maxScansToExplodeReached" : false,
      	"winningPlan" : {
      		"queryPlan" : {
      			"stage" : "PROJECTION_DEFAULT",
      			"planNodeId" : 2,
      			"transformBy" : {
      				"meta" : true,
      				"control" : {
      					"max" : {
      						"time" : true
      					}
      				},
      				"_id" : false
      			},
      			"inputStage" : {
      				"stage" : "COLLSCAN",
      				"planNodeId" : 1,
      				"filter" : {
      					
      				},
      				"direction" : "forward"
      			}
      		},
      		"slotBasedPlan" : {
      			"slots" : "$$RESULT=s13 $$RID=s5 env: { s2 = Timestamp(1628870938, 1) (CLUSTER_TIME), s1 = TimeZoneDatabase(Australia/NSW...US/Arizona) (timeZoneDB), s3 = 1628870945358 (NOW) }",
      			"stages" : "[2] traverse s13 s12 s4 [s5] {} {} \nfrom \n    [1] scan s4 s5 none none none none [] @\"f4710229-f7b9-46e0-ac36-260c26f8ab26\" true false \nin \n    [2] cfilter {isObject (s4)} \n    [2] mkbson s12 s4 [meta] keep [control = s11] true false \n    [2] traverse s11 s10 s6 {} {} \n    from \n        [2] project [s6 = getField (s4, \"control\")] \n        [2] limit 1 \n        [2] coscan \n    in \n        [2] cfilter {isObject (s6)} \n        [2] mkbson s10 s6 [] keep [max = s9] true false \n        [2] traverse s9 s8 s7 {} {} \n        from \n            [2] project [s7 = getField (s6, \"max\")] \n            [2] limit 1 \n            [2] coscan \n        in \n            [2] cfilter {isObject (s7)} \n            [2] mkbson s8 s7 [time] keep [] true false \n            [2] limit 1 \n            [2] coscan \n        \n    \n"
      		}
      	},
      	"rejectedPlans" : [ ]
      }
      

      Show
      db.x.insert({meta:1, control:{min:{time:1},max:{time:1}}}) db.x.insert({meta:2, control:{min:{time:1},max:{time:1}}}) db.x.createIndex({meta:1,"control.min.time":1,"control.max.time":1}) db.x.find({},{"meta":1, "control.max.time":1,_id:0},).hint({meta:1,"control.min.time":1,"control.max.time":1}).explain() Without the hint, the plan is a collection scan: rs:PRIMARY> db.x.find({},{"meta":1, "control.max.time":1,_id:0},).explain().queryPlanner { "namespace" : "test.x", "indexFilterSet" : false, "parsedQuery" : { }, "queryHash" : "CCF36A8A", "planCacheKey" : "C36EB5A6", "maxIndexedOrSolutionsReached" : false, "maxIndexedAndSolutionsReached" : false, "maxScansToExplodeReached" : false, "winningPlan" : { "queryPlan" : { "stage" : "PROJECTION_DEFAULT", "planNodeId" : 2, "transformBy" : { "meta" : true, "control" : { "max" : { "time" : true } }, "_id" : false }, "inputStage" : { "stage" : "COLLSCAN", "planNodeId" : 1, "filter" : { }, "direction" : "forward" } }, "slotBasedPlan" : { "slots" : "$$RESULT=s13 $$RID=s5 env: { s2 = Timestamp(1628870938, 1) (CLUSTER_TIME), s1 = TimeZoneDatabase(Australia/NSW...US/Arizona) (timeZoneDB), s3 = 1628870945358 (NOW) }", "stages" : "[2] traverse s13 s12 s4 [s5] {} {} \nfrom \n [1] scan s4 s5 none none none none [] @\"f4710229-f7b9-46e0-ac36-260c26f8ab26\" true false \nin \n [2] cfilter {isObject (s4)} \n [2] mkbson s12 s4 [meta] keep [control = s11] true false \n [2] traverse s11 s10 s6 {} {} \n from \n [2] project [s6 = getField (s4, \"control\")] \n [2] limit 1 \n [2] coscan \n in \n [2] cfilter {isObject (s6)} \n [2] mkbson s10 s6 [] keep [max = s9] true false \n [2] traverse s9 s8 s7 {} {} \n from \n [2] project [s7 = getField (s6, \"max\")] \n [2] limit 1 \n [2] coscan \n in \n [2] cfilter {isObject (s7)} \n [2] mkbson s8 s7 [time] keep [] true false \n [2] limit 1 \n [2] coscan \n \n \n" } }, "rejectedPlans" : [ ] }

      Description

      On a collection of 4.3 GB, with an index of 14.8 MB, the query planner chooses a collection scan over a covered index scan.

      > db.system.buckets.point_data.find({},{"meta":1, "control.max.time":1,_id:0},).hint({meta:1,"control.min.time":1,"control.max.time":1}).itcount()}}
      

      This query takes 3.4s, but without the hint it take 40s.

      The collection here is the system.buckets.point_data collection for a time-series collection, but the issue is not specific to time-series.

      I understand that often a collection scan can be faster, but in this case the 290x size difference between the index and the collection size should have been decisive. Anyway, it doesn't look like the index scan was considered at all here.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-query-optimization Backlog - Query Optimization
              Reporter:
              geert.bosch Geert Bosch
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated: