[SERVER-18734] The match $in operator is using a ValueSet(std::set) Created: 29/May/15 Updated: 04/Nov/15 Resolved: 04/Nov/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Antoine Hom | Assignee: | Charlie Swanson |
| Resolution: | Done | Votes: | 1 |
| Labels: | optimization | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
While implementing a feature to handle CSV like input of the form:
We naively implemented it with the following $match condition:
After seeing bad performances/scalability of this approach we tried two alternatives (these are in an aggregation pipeline):
We found that when starting to have big enough sets the $in approach was in fact slower and not even the same complexity than the $setIsSubset one. Is there a reason why $in is using a std::set over an std::unordered_set? |
| Comments |
| Comment by Charlie Swanson [ 04/Nov/15 ] |
|
It appears this ticket got lost for a while, sorry about that! I think the difference between std::set and std::unordered_set is a bit of a red herring. After inspecting, it looks like this comes down to the internal representation of the documents. During the aggregation pipeline, we convert the BSON into an internal representation (A Document), more akin to a hash map. This representation is much faster to work with than BSON. During the $match stage, we use the same logic as a $match during a query, which processes BSON, so we have to convert the document to BSON in order to do the $match, hence, the slowdown. I'm closing this ticket, as it would be a non-trivial amount of work to support matching on Documents, and there is a workaround (as you mentioned). Feel free to reopen if you disagree with this decision. |
| Comment by Antoine Hom [ 05/Jun/15 ] |
|
Hello Ramon, Sorry If I was not clear, this ticket explains the process that I went through to find two potential issues:
|
| Comment by Ramon Fernandez Marina [ 01/Jun/15 ] |
|
antoine.hom@amadeus.com, if I understand correctly SERVER-18733 is a superset of this ticket (no pun intended). If this is not the case, can you please elaborate what's different about this ticket and how a resolution of SERVER-18733 (which is tentatively scheduled for the current development cycle) may fail to address this ticket? Thanks, |