Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
None
-
ALL
Description
While implementing a feature to handle CSV like input of the form:
A,B,C // header
|
1,2,3
|
4,5,6
|
etc...
|
We naively implemented it with the following $match condition:
$or: [
|
{ A: 1, B: 2, C: 3},
|
{ A: 4, B: 5, C: 6},
|
etc...
|
]
|
After seeing bad performances/scalability of this approach we tried two alternatives (these are in an aggregation pipeline):
- One with $in:
$project: {
|
computed_obj: { "1": "$A", "2": "$B", "3": "$C" }
|
},
|
$match: {
|
computed_obj: {
|
$in: [
|
{ "1": 1, "2": 2, "3": 3 },
|
{ "1": 3, "2": 4, "3": 5 },
|
etc...
|
]
|
}
|
}
|
- One with $setIsSubset:
$project: {
|
condition_value: {
|
$setIsSubset: [
|
{
|
$map: {
|
input: [null],
|
as: "var__",
|
in { "1": "$A", "2": "$B", "3": "$C" }
|
}
|
},
|
[
|
{"1": 1, "2": 2, "3": 3},
|
{"1": 3, "2": 4, "3": 5},
|
etc...
|
]
|
]
|
}
|
},
|
$match: { condition_value: true }
|
We found that when starting to have big enough sets the $in approach was in fact slower and not even the same complexity than the $setIsSubset one.
We then noticed that $setIsSubset is using a std::unordered_set whereas $in is using a simple std::set.
Is there a reason why $in is using a std::set over an std::unordered_set?
Attachments
Issue Links
- related to
-
SERVER-18733 Streamline set cache optimization for set operations
-
- Backlog
-