Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92357

Create js testing strategy for search scoring differences in sharded vs non-sharded configurations

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • Fully Compatible

      While adding rank and score fusion hybrid search tests to our js testing suite with mongot, it has become apparent that score metadata associated with search results differs depending on the cluster configuration (i.e. single node vs sharded, number or shards in the cluster).

       

      This is important for our hybrid search tests as the order of documents outputted by the query is asserted on to confirm the test success, which can now change for the same test running in a single node vs sharded configuration. 

       

      This problem will not be limited to hybrid search queries, but rather most search queries as it is likely that our future tests will also assert that the documents returned by a query are in a specific order influenced by the search score.

       

      The goal of this ticket is to investigate a generic strategy to handle these cases and implement a set of utilities that can be reused for tests that encounter this problem.

       

      Currently there are two general approaches in consideration, both of which can be implemented and are not mutually exclusive.

       

      Approach 1:

      Create a utility that can determine if the test is running in a sharded or unsharded configuration. This will allow us to run the same query/test with different expectations based on the test cluster configuration.

       

      This approach will likely work well as the test infrastructure currently stands (with either a single node or sharded option), but will likely break apart if the cluster configurations become more generic. For example, if a sharded cluster has 2, or 3, or N shards, the search scores will likely be different for all of these, growing the number of different orderings the test has to expect.

       

      Approach 2: 

      Make the test unaware of the cluster configuration its running in, but make the assertion logic more "fuzzy" / relaxed.

       

      While the search scores / ranks might be different in the different cluster configurations, they are not and should never be vastly different. So the relative ordering of the results should always be somewhat similar, just not exactly the same.

       

      If we can create a utility(s) that can tolerate a set of possible document orderings, and reject others, the same assertion utility can be used regardless of the cluster configuration.

       

      Some possible approaches here are:

      • Pass in one expected ordering, but tolerate that each document can be within x positions of its expected position
      • Pass in scores associated with the documents, so that the assertion utility has a direct understanding of which documents should proceed others.
      • Specify rules such as "these docs must appear in the first x results" to enforce some ordering, but not a strict one

            Assignee:
            joseph.shalabi@mongodb.com Joe Shalabi
            Reporter:
            joseph.shalabi@mongodb.com Joe Shalabi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: