Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
- M0

Assigned Teams:

Query Integration
Backwards Compatibility:
Fully Compatible
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

While adding rank and score fusion hybrid search tests to our js testing suite with mongot, it has become apparent that score metadata associated with search results differs depending on the cluster configuration (i.e. single node vs sharded, number or shards in the cluster).

This is important for our hybrid search tests as the order of documents outputted by the query is asserted on to confirm the test success, which can now change for the same test running in a single node vs sharded configuration.

This problem will not be limited to hybrid search queries, but rather most search queries as it is likely that our future tests will also assert that the documents returned by a query are in a specific order influenced by the search score.

The goal of this ticket is to investigate a generic strategy to handle these cases and implement a set of utilities that can be reused for tests that encounter this problem.

Currently there are two general approaches in consideration, both of which can be implemented and are not mutually exclusive.

Approach 1:

Create a utility that can determine if the test is running in a sharded or unsharded configuration. This will allow us to run the same query/test with different expectations based on the test cluster configuration.

This approach will likely work well as the test infrastructure currently stands (with either a single node or sharded option), but will likely break apart if the cluster configurations become more generic. For example, if a sharded cluster has 2, or 3, or N shards, the search scores will likely be different for all of these, growing the number of different orderings the test has to expect.

Approach 2:

Make the test unaware of the cluster configuration its running in, but make the assertion logic more "fuzzy" / relaxed.

While the search scores / ranks might be different in the different cluster configurations, they are not and should never be vastly different. So the relative ordering of the results should always be somewhat similar, just not exactly the same.

If we can create a utility(s) that can tolerate a set of possible document orderings, and reject others, the same assertion utility can be used regardless of the cluster configuration.

Some possible approaches here are:

Pass in one expected ordering, but tolerate that each document can be within x positions of its expected position
Pass in scores associated with the documents, so that the assertion utility has a direct understanding of which documents should proceed others.

Specify rules such as "these docs must appear in the first x results" to enforce some ordering, but not a strict one

related to

SERVER-91200 Add end-to-end ranked fusion test using existing syntax

Closed

SERVER-91201 Add end-to-end score fusion test using existing syntax

Closed

Assignee:: Joe Shalabi
Reporter:: Joe Shalabi
Participants:: Githook User, Joe Shalabi
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jul 11 2024 08:53:06 PM UTC
Updated:: Aug 23 2024 08:26:18 PM UTC
Resolved:: Jul 17 2024 11:27:51 PM UTC
Confidence Status Last Update:: 12/Jul/24 9:41 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates