Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18734

The match $in operator is using a ValueSet(std::set)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Aggregation Framework
    • Labels:
    • Operating System:
      ALL

      Description

      While implementing a feature to handle CSV like input of the form:

      A,B,C // header
      1,2,3
      4,5,6
      etc...
      

      We naively implemented it with the following $match condition:

      $or: [
          { A: 1, B: 2, C: 3},
          { A: 4, B: 5, C: 6},
          etc...
      ]
      

      After seeing bad performances/scalability of this approach we tried two alternatives (these are in an aggregation pipeline):

      • One with $in:

      $project: {
          computed_obj: { "1": "$A", "2": "$B", "3": "$C" }
      },
      $match: {
          computed_obj: { 
              $in: [
                  { "1": 1, "2": 2, "3": 3 },
                  { "1": 3, "2": 4, "3": 5 },
                  etc...
              ]
          }
      }
      

      • One with $setIsSubset:

      $project: {
          condition_value: {
              $setIsSubset: [
                  {
                      $map: {
                          input: [null], 
                          as: "var__", 
                          in { "1": "$A", "2": "$B", "3": "$C" }
                      }
                  }, 
                  [
                     {"1": 1, "2": 2, "3": 3},
                     {"1": 3, "2": 4, "3": 5},
                     etc...
                  ]
              ]
          }
      }, 
      $match: { condition_value: true }
      

      We found that when starting to have big enough sets the $in approach was in fact slower and not even the same complexity than the $setIsSubset one.
      We then noticed that $setIsSubset is using a std::unordered_set whereas $in is using a simple std::set.

      Is there a reason why $in is using a std::set over an std::unordered_set?

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              charlie.swanson Charlie Swanson
              Reporter:
              antoine.hom@amadeus.com Antoine Hom
              Participants:
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: