Add distance expressions for image feature comparison

XMLWordPrintableJSON

    • Type: New Feature
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      This ticket tracks the work contained in Pull Request #1291.

      We added these expressions:

      '$cossim', '$chi2', '$euclidean', '$squared_euclidean', '$manhattan'

      Which allow us to compare long vectors (image features) stored as arrays or BSON.
      It is useful to find the most similar images in a dataset. The usage is the following:

      db.test_speed.aggregate([
          {   
              '$project':
              {
                  'id': '$id',
                  "other_id": '$other_id',
                  'distance': {'$cossim': [vector, '$vector']},
              },
          },
          {"$sort": {"distance": -1}},
          {"$limit": 20}
      ])
      

      In addition implementations using avx2 and avx512 are included in this pull request.

            Assignee:
            [DO NOT USE] Backlog - Query Optimization
            Reporter:
            Kelsey Schubert
            Votes:
            1 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: