|
Currently the decision to use LookupStrategy::kHashJoin for a $lookup in SBE is based entirely on the stats of the foreign collection. This means that if the foreign collection has stats and is small enough, it will choose HashJoin even if the local collection has only one document in it. In edge cases like this, LookupStrategy::kIndexedLoopJoin or LookupStrategy::kNestedLoopJoin would be faster.
The join strategy selection should be improved to veto HashJoin and use INLJ or NLJ instead if there both are stats available for the local collection AND they show it has a very small number of documents (maybe < 1,000? Some experimentation needed to find a reasonable cutoff point). If there are no stats available for the local collection, it should continue to assume that the hash table will pay for itself and choose HashJoin, as that will be the more common case in practice.
|