-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
(copied to CRM)
MongoS is not able to do proper shard pruning in the case the shard key filter in the find command uses a rooted regex. The following illustrates how a suboptimal shard pruning can be achieved. In the case of a rooted regext MongoS should be able to prune the shards more aggressively. Similar like a $GTE would achieve in this case.
- Loading data and splitting the data manually:
sh.enableSharding("test") use test sh.shardCollection("test.b", {_id:1}) db.b.insertOne({_id:"aa1", a:1}) db.b.insertOne({_id:"bb1", a:1}) db.b.insertOne({_id:"cc1", a:1}) sh.splitAt( "test.b", { "_id": "b" } ) sh.splitAt( "test.b", { "_id": "c" } )
2. Then redistributing the data to all 3 shards, and checking the data distribution:
> db.adminCommand( { moveChunk : "test.b" , find : {"_id" : "aa1"} , to : "shard01" }) > db.adminCommand( { moveChunk : "test.b" , find : {"_id" : "bb1"} , to : "shard02" }) > sh.status() collections: { 'test.b': { shardKey: { _id: 1 }, unique: false, balancing: true, chunkMetadata: [ { shard: 'shard01', nChunks: 1 }, { shard: 'shard02', nChunks: 1 }, { shard: 'shard03', nChunks: 1 } ], chunks: [ { min: { _id: MinKey() }, max: { _id: 'b' }, 'on shard': 'shard01', 'last modified': Timestamp({ t: 2, i: 0 }) }, { min: { _id: 'b' }, max: { _id: 'c' }, 'on shard': 'shard02', 'last modified': Timestamp({ t: 3, i: 0 }) }, { min: { _id: 'c' }, max: { _id: MaxKey() }, 'on shard': 'shard03', 'last modified': Timestamp({ t: 3, i: 1 }) } ], tags: [] } }
- And now the first query plan execution:
> db.b.find({_id: {$regex: "^aa"}}).explain() { queryPlanner: { mongosPlannerVersion: 1, winningPlan: { stage: 'SHARD_MERGE', shards: [ { shardName: 'shard01', connectionString: 'shard01/localhost:27019,localhost:27020,localhost:27021', serverInfo: { host: 'W-G93DVT3', port: 27020, version: '6.0.16', gitVersion: '1bbe71e91a41b097b19d036dee47b861b3f27181' }, namespace: 'test.b', indexFilterSet: false, parsedQuery: { _id: { '$regex': '^aa' } }, queryHash: '79357726', planCacheKey: '7581FD14', maxIndexedOrSolutionsReached: false, maxIndexedAndSolutionsReached: false, maxScansToExplodeReached: false, winningPlan: { stage: 'FETCH', inputStage: { stage: 'SHARDING_FILTER', inputStage: { stage: 'IXSCAN', keyPattern: { _id: 1 }, indexName: '_id_', isMultiKey: false, multiKeyPaths: { _id: [] }, isUnique: true, isSparse: false, isPartial: false, indexVersion: 2, direction: 'forward', indexBounds: { _id: [ '["aa", "ab")', '[/^aa/, /^aa/]' ] } } } }, rejectedPlans: [] }, { shardName: 'shard03', connectionString: 'shard03/localhost:27025,localhost:27026,localhost:27027', serverInfo: { host: 'W-G93DVT3', port: 27025, version: '6.0.16', gitVersion: '1bbe71e91a41b097b19d036dee47b861b3f27181' }, namespace: 'test.b', indexFilterSet: false, parsedQuery: { _id: { '$regex': '^aa' } }, queryHash: '79357726', planCacheKey: '7581FD14', maxIndexedOrSolutionsReached: false, maxIndexedAndSolutionsReached: false, maxScansToExplodeReached: false, winningPlan: { stage: 'FETCH', inputStage: { stage: 'SHARDING_FILTER', inputStage: { stage: 'IXSCAN', keyPattern: { _id: 1 }, indexName: '_id_', isMultiKey: false, multiKeyPaths: { _id: [] }, isUnique: true, isSparse: false, isPartial: false, indexVersion: 2, direction: 'forward', indexBounds: { _id: [ '["aa", "ab")', '[/^aa/, /^aa/]' ] } } } }, rejectedPlans: [] } ] } }, serverInfo: { host: 'W-G93DVT3', port: 27017, version: '6.0.16', gitVersion: '1bbe71e91a41b097b19d036dee47b861b3f27181' }, serverParameters: { internalQueryFacetBufferSizeBytes: 104857600, internalQueryFacetMaxOutputDocSizeBytes: 104857600, internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600, internalDocumentSourceGroupMaxMemoryBytes: 104857600, internalQueryMaxBlockingSortMemoryUsageBytes: 104857600, internalQueryProhibitBlockingMergeOnMongoS: 0, internalQueryMaxAddToSetBytes: 104857600, internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600 }, command: { find: 'b', filter: { _id: { '$regex': '^aa' } }, lsid: { id: UUID('11bc9e47-4860-41e4-adcb-ce98d9dd4a79') }, '$clusterTime': { clusterTime: Timestamp({ t: 1720616947, i: 1 }), signature: { hash: Binary.createFromBase64('SbIf7jFRvkwTtPDorRtDz4m1RDo=', 0), keyId: Long('7389660269795868697') } }, '$db': 'test' }, ok: 1, '$clusterTime': { clusterTime: Timestamp({ t: 1720616950, i: 1 }), signature: { hash: Binary.createFromBase64('jryXI+XkgoA9ZCg1D6PSZlhC7Vs=', 0), keyId: Long('7389660269795868697') } }, operationTime: Timestamp({ t: 1720616947, i: 1 }) }
- And the second query plan execution:
> db.b.find({_id: {$regex: "^aaddd"}}).explain() { queryPlanner: { mongosPlannerVersion: 1, winningPlan: { stage: 'SHARD_MERGE', shards: [ { shardName: 'shard01', connectionString: 'shard01/localhost:27019,localhost:27020,localhost:27021', serverInfo: { host: 'W-G93DVT3', port: 27020, version: '6.0.16', gitVersion: '1bbe71e91a41b097b19d036dee47b861b3f27181' }, namespace: 'test.b', indexFilterSet: false, parsedQuery: { _id: { '$regex': '^aaddd' } }, queryHash: '79357726', planCacheKey: '7581FD14', maxIndexedOrSolutionsReached: false,
This optimization should only be done for strongly typed fields that disallow regular expression as values. As illustrated in the case above the _id field is the current only known field that would disallow it. Yet enabling this optimization for _id based sharding still provides value as there are enough usecases.