-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
In the query below, the cardinality estimate for the top-level join is larger than the actual cardinality by two orders of magnitude:
pipeline = EJSON.deserialize(
[{
"$match": {
"p_comment": {
"$regex": "^ r"
}
}
},
{
"$lookup": {
"from": "lineitem",
"localField": "p_partkey",
"foreignField": "l_partkey",
"as": "lineitem", }
},
{
"$unwind": "$lineitem"
},
{
"$lookup": {
"from": "partsupp",
"localField": "p_partkey",
"foreignField": "ps_partkey",
"as": "partsupp",
}
},
{
"$unwind": "$partsupp"
}
]
);
nterprise subjoin_cardinality> db.part.aggregate(pipeline).toArray().length;
24344
Enterprise subjoin_cardinality> db.part.aggregate(pipeline).explain().queryPlanner.winningPlan.queryPlan.cardinalityEstimate;
1695352.9477173113
Enterprise subjoin_cardinality> db.part.aggregate(pipeline).explain().queryPlanner.winningPlan.queryPlan;
{
stage: 'HASH_JOIN_EMBEDDING',
planNodeId: 6,
costEstimate: 1405.4084090308088,
cardinalityEstimate: 1695352.9477173113,
estimatesMetadata: { ceSource: 'Sampling' },
leftEmbeddingField: 'none',
rightEmbeddingField: 'partsupp',
joinPredicates: [ 'p_partkey = ps_partkey', 'lineitem.l_partkey = ps_partkey' ],
inputStages: [
{
stage: 'INDEXED_NESTED_LOOP_JOIN_EMBEDDING',
planNodeId: 4,
costEstimate: 183.26585501196288,
cardinalityEstimate: 5405.148,
estimatesMetadata: { ceSource: 'Metadata' },
leftEmbeddingField: 'none',
rightEmbeddingField: 'lineitem',
joinPredicates: [ 'p_partkey = l_partkey' ],
inputStages: [
{
stage: 'COLLSCAN',
planNodeId: 1,
costEstimate: 20.949809859962873,
cardinalityEstimate: 180,
estimatesMetadata: { ceSource: 'Code' },
filter: { p_comment: { '$regex': '^ r' } },
nss: 'subjoin_cardinality.part',
direction: 'forward'
},
{
stage: 'FETCH',
planNodeId: 3,
nss: 'subjoin_cardinality.lineitem',
inputStage: {
stage: 'INDEX_PROBE_NODE',
planNodeId: 2,
nss: 'subjoin_cardinality.lineitem',
keyPattern: { l_partkey: 1 },
indexName: 'l_partkey_1',
isMultiKey: false,
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2
}
}
]
},
{
stage: 'COLLSCAN',
planNodeId: 5,
costEstimate: 132.31859943985148,
cardinalityEstimate: 80000,
estimatesMetadata: { ceSource: 'Code' },
filter: {},
nss: 'subjoin_cardinality.partsupp',
direction: 'forward'
}
]
}
The estimate for the two-table join between part and lineitem is reasonably correct.
Git revision a5de0b02d614d7d5bb756bb71ee082ddca8849a1