[SERVER-71036] Consider resolving variables inside $search before forwarding to mongot Created: 02/Nov/22  Updated: 26/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Jacob Evans Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 4
Labels: qi-search
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-60800 Allow $search in $lookup/$unionWith Closed
Assigned Teams:
Query Integration
Participants:

 Description   

Currently if a variable occurs in a $search stage, we do nothing to interpret it and send it as a string to Lucene. We should discuss transforming

{
  $search: {
    index: 'default',
    text: {
      query: '$$name',
      path: 'name'
    }
  }
} 

to

{
  $search: {
    index: 'default',
    text: {
      query: 'Jacob',
      path: 'name'
    }
  }
}
 

for each document. We would also need to track the variable usage within the stage to ensure correctness.

Note that there would be no plans to evaluate field references so single-dollar-sign-prefixed names would not be resolved. '$name' would not result in a per-document value but instead would continue to produce the literal string '$name'.



 Comments   
Comment by John Underwood [ 14/Jul/23 ]

This would also be useful for finding recommendations for a given document based on vector similarity. As in this example that I'm trying to run:

{
    $lookup:
      {
        from: "movies",
        let: {
          vector: "$avg_embedding",
        },
        pipeline: [
          {
            $search: {
              knnBeta: {
                vector: "$$vector",
                path: "plot_embedding",
              },
            },
          },
        ],
        as: "recs",
      },
  } 

Comment by Dara Gies [ 19/Jan/23 ]

This limitation means it's not possible to do a correlated subquery which is often the purpose of $lookup. In this example I'm trying to find all of the resorts having the 'rock climbing' amenity within 10k meters of Burlington, VT and Warren, VT for example. This is a pattern for real estate queries where listing counts are displayed for pre-defined geo points. 

[{
 $search: {
  index: 'geo_resorts',
  text: {
   query: 'locus',
   path: 'type'
  }
 }
}, {
 $lookup: {
  from: 'resorts',
  'let': {
   location: '$location'
  },
  pipeline: [
   {
    $search: {
     index: 'geo_resorts',
     compound: {
      must: {
       text: {
        path: 'amenities',
        query: 'Rock Climbing'
       }
      },
      filter: {
       geoWithin: {
        circle: {
         center: {
          type: 'Point',
          coordinates: '$$location.coordinates'
          //"coordinates": [44.11733091181499, -72.85666784230044]
         },
         radius: 10000
        },
        path: 'location'
       }
      }
     }
    }
   }
  ],
  as: 'resorts'
 }
}] 

Comment by Evan Nixon [ 28/Nov/22 ]

Hey ralph.johnson@mongodb.com , thanks for elaborating - it's a good point that variable expansion on a per-document level is probably what users expect here. It will be helpful to have this note in the future when considering scheduling

Comment by Ralph Johnson [ 28/Nov/22 ]

Hi Evan, I am responsible for the https://jira.mongodb.org/browse/HELP-38974 ticket.  Doing it per document is exactly the behaviour I would expect in this instance.  If we don't do this I really don't understand how we can claim that you can run $search queries in a $lookup as I can't think of any reason to run that type of query.

Comment by Evan Nixon [ 18/Nov/22 ]

Its a good question - I imagine we'd resolve these variables upstream of slow query logging, so from the perspective of Query Analytics this would be similar to running a bunch of normal queries.

I think doing this per-document would mean running a separate $search query for each document passing through the pipeline FWIW - and would be pretty slow/kind of a dangerous way to turn one aggregation pipeline query into thousands of $search queries.

Comment by Oren Ovadia [ 14/Nov/22 ]

I don't think this conflicts with Query Analytics. But perhaps users will be able to utilize variables like this to, for instance, put `$$name` in several places in the $search query without repeating it.

Comment by Elle Shwer [ 04/Nov/22 ]

This is interesting, how would it conflict with Query Analytics? ...if at all..

This exact ask has not come up from users, I don't think they would notice this happening.

Comment by Oren Ovadia [ 03/Nov/22 ]

evan.nixon@mongodb.com , elle.shwer@mongodb.com , see the attached HELP ticket where users were trying to use variables in MQL in and outside of the $search stage, they were surprised the variables did not resolve the same way other stages do.

Elle, has this ever come up that you can remember?

Generated at Thu Feb 08 06:17:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.