-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
(copied to CRM)
-
None
-
None
-
None
-
None
-
None
-
None
-
None
MongoS is not able to do proper shard pruning in the case the shard key filter in the find command uses a rooted regex. The following illustrates how a suboptimal shard pruning can be achieved. In the case of a rooted regext MongoS should be able to prune the shards more aggressively. Similar like a $GTE would achieve in this case.
- Loading data and splitting the data manually:
sh.enableSharding("test") use test sh.shardCollection("test.b", {_id:1}) db.b.insertOne({_id:"aa1", a:1}) db.b.insertOne({_id:"bb1", a:1}) db.b.insertOne({_id:"cc1", a:1}) sh.splitAt( "test.b", { "_id": "b" } ) sh.splitAt( "test.b", { "_id": "c" } )
2. Then redistributing the data to all 3 shards, and checking the data distribution:
> db.adminCommand( { moveChunk : "test.b" , find : {"_id" : "aa1"} , to : "shard01" })
> db.adminCommand( { moveChunk : "test.b" , find : {"_id" : "bb1"} , to : "shard02" })
> sh.status()
collections: {
'test.b': {
shardKey: { _id: 1 },
unique: false,
balancing: true,
chunkMetadata: [
{ shard: 'shard01', nChunks: 1 },
{ shard: 'shard02', nChunks: 1 },
{ shard: 'shard03', nChunks: 1 }
],
chunks: [
{ min: { _id: MinKey() }, max: { _id: 'b' }, 'on shard': 'shard01', 'last modified': Timestamp({ t: 2, i: 0 }) },
{ min: { _id: 'b' }, max: { _id: 'c' }, 'on shard': 'shard02', 'last modified': Timestamp({ t: 3, i: 0 }) },
{ min: { _id: 'c' }, max: { _id: MaxKey() }, 'on shard': 'shard03', 'last modified': Timestamp({ t: 3, i: 1 }) }
],
tags: []
}
}
- And now the first query plan execution:
> db.b.find({_id: {$regex: "^aa"}}).explain()
{
queryPlanner: {
mongosPlannerVersion: 1,
winningPlan: {
stage: 'SHARD_MERGE',
shards: [
{
shardName: 'shard01',
connectionString: 'shard01/localhost:27019,localhost:27020,localhost:27021',
serverInfo: {
host: 'W-G93DVT3',
port: 27020,
version: '6.0.16',
gitVersion: '1bbe71e91a41b097b19d036dee47b861b3f27181'
},
namespace: 'test.b',
indexFilterSet: false,
parsedQuery: { _id: { '$regex': '^aa' } },
queryHash: '79357726',
planCacheKey: '7581FD14',
maxIndexedOrSolutionsReached: false,
maxIndexedAndSolutionsReached: false,
maxScansToExplodeReached: false,
winningPlan: {
stage: 'FETCH',
inputStage: {
stage: 'SHARDING_FILTER',
inputStage: {
stage: 'IXSCAN',
keyPattern: { _id: 1 },
indexName: '_id_',
isMultiKey: false,
multiKeyPaths: { _id: [] },
isUnique: true,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { _id: [ '["aa", "ab")', '[/^aa/, /^aa/]' ] }
}
}
},
rejectedPlans: []
},
{
shardName: 'shard03',
connectionString: 'shard03/localhost:27025,localhost:27026,localhost:27027',
serverInfo: {
host: 'W-G93DVT3',
port: 27025,
version: '6.0.16',
gitVersion: '1bbe71e91a41b097b19d036dee47b861b3f27181'
},
namespace: 'test.b',
indexFilterSet: false,
parsedQuery: { _id: { '$regex': '^aa' } },
queryHash: '79357726',
planCacheKey: '7581FD14',
maxIndexedOrSolutionsReached: false,
maxIndexedAndSolutionsReached: false,
maxScansToExplodeReached: false,
winningPlan: {
stage: 'FETCH',
inputStage: {
stage: 'SHARDING_FILTER',
inputStage: {
stage: 'IXSCAN',
keyPattern: { _id: 1 },
indexName: '_id_',
isMultiKey: false,
multiKeyPaths: { _id: [] },
isUnique: true,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { _id: [ '["aa", "ab")', '[/^aa/, /^aa/]' ] }
}
}
},
rejectedPlans: []
}
]
}
},
serverInfo: {
host: 'W-G93DVT3',
port: 27017,
version: '6.0.16',
gitVersion: '1bbe71e91a41b097b19d036dee47b861b3f27181'
},
serverParameters: {
internalQueryFacetBufferSizeBytes: 104857600,
internalQueryFacetMaxOutputDocSizeBytes: 104857600,
internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
internalDocumentSourceGroupMaxMemoryBytes: 104857600,
internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
internalQueryProhibitBlockingMergeOnMongoS: 0,
internalQueryMaxAddToSetBytes: 104857600,
internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
},
command: {
find: 'b',
filter: { _id: { '$regex': '^aa' } },
lsid: { id: UUID('11bc9e47-4860-41e4-adcb-ce98d9dd4a79') },
'$clusterTime': {
clusterTime: Timestamp({ t: 1720616947, i: 1 }),
signature: {
hash: Binary.createFromBase64('SbIf7jFRvkwTtPDorRtDz4m1RDo=', 0),
keyId: Long('7389660269795868697')
}
},
'$db': 'test'
},
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1720616950, i: 1 }),
signature: {
hash: Binary.createFromBase64('jryXI+XkgoA9ZCg1D6PSZlhC7Vs=', 0),
keyId: Long('7389660269795868697')
}
},
operationTime: Timestamp({ t: 1720616947, i: 1 })
}
- And the second query plan execution:
> db.b.find({_id: {$regex: "^aaddd"}}).explain()
{
queryPlanner: {
mongosPlannerVersion: 1,
winningPlan: {
stage: 'SHARD_MERGE',
shards: [
{
shardName: 'shard01',
connectionString: 'shard01/localhost:27019,localhost:27020,localhost:27021',
serverInfo: {
host: 'W-G93DVT3',
port: 27020,
version: '6.0.16',
gitVersion: '1bbe71e91a41b097b19d036dee47b861b3f27181'
},
namespace: 'test.b',
indexFilterSet: false,
parsedQuery: { _id: { '$regex': '^aaddd' } },
queryHash: '79357726',
planCacheKey: '7581FD14',
maxIndexedOrSolutionsReached: false,
This optimization should only be done for strongly typed fields that disallow regular expression as values. As illustrated in the case above the _id field is the current only known field that would disallow it. Yet enabling this optimization for _id based sharding still provides value as there are enough usecases.