Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16384

2d compound index results in a wider object scan for certain field name orderings

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.5
    • Component/s: Geo, Querying
    • None
    • Fully Compatible
    • ALL
    • Hide
      1. Generate data - 20K large objects and related 2d compound index
        function generateData(count, uniqueAppIds) {
        	appIds = []
        	for (i = 0; i < uniqueAppIds; ++i) {
        		appIds[i] = ObjectId();
        	}
        
        	dummyx = new Array(10000);
        	dummyx = dummyx.join("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
        
        
        	for (i = 0; i < count; i++) {
        		doc = {
        			extra_s: dummyx,
        			xid: appIds[Math.round(Math.random() * 9)],
        			pos: [Math.round(Math.random() * 360) - 180, Math.round(Math.random() * 180) - 90]
        		};
        		db.places.insert(doc);
        	}
        
        	/* Ensure Indices */
        	db.places.ensureIndex({pos: "2d", xid: 1});
        }
        
        function generateData(20000, 100)
        
      2. drop vm cache
        # on MacOS this would be 
        sudo purge
        
      3. Run the query that is expected to work fine
        db.places.find({pos: { $nearSphere: [ 0, 0 ], $maxDistance: 125}, xid: ObjectId("547dbbe1290b9685990ca000"), xix: 1}).explain()
        
        {
        	"cursor" : "GeoSearchCursor",
        	"isMultiKey" : false,
        	"n" : 0,
        	"nscannedObjects" : 0,
        	"nscanned" : 20000,
        	"nscannedObjectsAllPlans" : 0,
        	"nscannedAllPlans" : 20000,
        	"scanAndOrder" : false,
        	"indexOnly" : false,
        	"nYields" : 0,
        	"nChunkSkips" : 0,
        	"millis" : 88,
        	"indexBounds" : {
        		
        	},
        	"server" : "aks-osx.local:37018",
        	"filterSet" : false
        }
        
      4. Drop VM cache
        sudo purge
        
      5. Run query that does wider object scan than intended
        db.places.find({pos: { $nearSphere: [ 0, 0 ], $maxDistance: 125}, xid: ObjectId("547dbbe1290b9685990ca000"), cix: 1}).explain()
        
        {
        	"cursor" : "GeoSearchCursor",
        	"isMultiKey" : false,
        	"n" : 0,
        	"nscannedObjects" : 20000,
        	"nscanned" : 20000,
        	"nscannedObjectsAllPlans" : 20000,
        	"nscannedAllPlans" : 20000,
        	"scanAndOrder" : false,
        	"indexOnly" : false,
        	"nYields" : 0,
        	"nChunkSkips" : 0,
        	"millis" : 13024,
        	"indexBounds" : {
        		
        	},
        	"server" : "aks-osx.local:37018",
        	"filterSet" : false
        }
        
      Show
      Generate data - 20K large objects and related 2d compound index function generateData(count, uniqueAppIds) { appIds = [] for (i = 0; i < uniqueAppIds; ++i) { appIds[i] = ObjectId(); } dummyx = new Array(10000); dummyx = dummyx.join( "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ); for (i = 0; i < count; i++) { doc = { extra_s: dummyx, xid: appIds[ Math .round( Math .random() * 9)], pos: [ Math .round( Math .random() * 360) - 180, Math .round( Math .random() * 180) - 90] }; db.places.insert(doc); } /* Ensure Indices */ db.places.ensureIndex({pos: "2d" , xid: 1}); } function generateData(20000, 100) drop vm cache # on MacOS this would be sudo purge Run the query that is expected to work fine db.places.find({pos: { $nearSphere: [ 0, 0 ], $maxDistance: 125}, xid: ObjectId( "547dbbe1290b9685990ca000" ), xix: 1}).explain() { "cursor" : "GeoSearchCursor", "isMultiKey" : false, "n" : 0, "nscannedObjects" : 0, "nscanned" : 20000, "nscannedObjectsAllPlans" : 0, "nscannedAllPlans" : 20000, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 88, "indexBounds" : { }, "server" : "aks-osx.local:37018", "filterSet" : false } Drop VM cache sudo purge Run query that does wider object scan than intended db.places.find({pos: { $nearSphere: [ 0, 0 ], $maxDistance: 125}, xid: ObjectId( "547dbbe1290b9685990ca000" ), cix: 1}).explain() { "cursor" : "GeoSearchCursor", "isMultiKey" : false, "n" : 0, "nscannedObjects" : 20000, "nscanned" : 20000, "nscannedObjectsAllPlans" : 20000, "nscannedAllPlans" : 20000, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 13024, "indexBounds" : { }, "server" : "aks-osx.local:37018", "filterSet" : false }

      When using a compound 2d index, the query framework results in wider collection scan than intended for certain field name ordering in the query filters.

      As seen in the repro output if there is an additional field in the query filter with field name that is lexicographically lower than the non-geo field that is part of the compound 2d index, it will result in a wider object scan (all objects will be loaded after applying the Geo-only filter).

      There are couple of alternatives like using 2dsphere index, however for people moving from 2.4 to 2.6, this is a surprising change that works well in 2.4 (and works well again in 2.8)

            Assignee:
            Unassigned Unassigned
            Reporter:
            anil.kumar Anil Kumar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: