While implementing a feature to handle CSV like input of the form:
We naively implemented it with the following $match condition:
After seeing bad performances/scalability of this approach we tried two alternatives (these are in an aggregation pipeline):
- One with $in:
- One with $setIsSubset:
We found that when starting to have big enough sets the $in approach was in fact slower and not even the same complexity than the $setIsSubset one.
We then noticed that $setIsSubset is using a std::unordered_set whereas $in is using a simple std::set.
Is there a reason why $in is using a std::set over an std::unordered_set?