[SERVER-53297] Return sorted query results in random order amongst sets of documents with equal sort keys Created: 09/Dec/20 Updated: 06/Dec/22 Resolved: 05/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mindaugas Malinauskas | Assignee: | Backlog - Query Execution |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Motivation. Application developers sometimes accidentally rely on the order of results returned by sort constructs such as $sort aggregation stage or sort() method in addition to what is guaranteed by the sort construct. That may happen because in some queries, circumstances, and versions of the server there could be an apparent order of the results that is consistent between queries. However, that order is not guaranteed by the server and can cause system correctness problems, such as HELP-20458, when the server configuration changes (such as when a single replica set is transformed to a sharded cluster) or the server evolves. Proposed solution. To increase a probability of detecting this design mistake earlier in the lifecycle of the application systems, randomize the order of the results produced by sort constructs. That is, if the values of fields to sort by are equal then the relative order of such documents should be random (stochastic) and not persistent between queries. Example Given the collection coll:
ideally, a probability of getting Result A or Result B should be the same and equal to 0.25. Result A:
Result B:
Design considerations However, a perfect randomization (that produces any possible permutation of results with equal probability) is not required, just “random enough” to achieve the goal. Also, if necessary, trade-offs should be made to favour minimization of resource usage overhead of the feature instead of perfect randomization. This feature could have a toggle for turning it off at the server/query level. |
| Comments |
| Comment by David Storch [ 05/Jan/21 ] |
|
We decided not to do this because although it would be useful to help customers find bugs in their applications, it would be wasted work for the server to perform. Therefore, it is the kind of behavior that we would only want to introduce in debug builds. However, debug builds are only used for our internal testing and are not shipped to customers. This in turn means customers would not be able to take advantage of this behavior to find bugs in their application. |