Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53297

Return sorted query results in random order amongst sets of documents with equal sort keys

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Querying
    • Labels:
      None
    • Query Execution

      Motivation. Application developers sometimes accidentally rely on the order of results returned by sort constructs such as $sort aggregation stage or sort() method in addition to what is guaranteed by the sort construct. That may happen because in some queries, circumstances, and versions of the server there could be an apparent order of the results that is consistent between queries. However, that order is not guaranteed by the server and can cause system correctness problems, such as HELP-20458, when the server configuration changes (such as when a single replica set is transformed to a sharded cluster) or the server evolves.

      Proposed solution. To increase a probability of detecting this design mistake earlier in the lifecycle of the application systems, randomize the order of the results produced by sort constructs. That is, if the values of fields to sort by are equal then the relative order of such documents should be random (stochastic) and not persistent between queries.

      Example

      Given the collection coll:
       

      db.coll.insert({a:1, b:1}, {a:2, b:2}, {a:1, b:3}, {a:2, b:4})

       
      When issuing a command

      db.coll.aggregate([{$sort: {a: 1}}])

      ideally, a probability of getting Result A or Result B should be the same and equal to 0.25.

      Result A:

      { "_id" : ObjectId("5fd0a1937a7cb559d4cff62b"), "a" : 1, "b" : 1 }
      { "_id" : ObjectId("5fd0a1937a7cb559d4cff62d"), "a" : 1, "b" : 3 }
      { "_id" : ObjectId("5fd0a1937a7cb559d4cff62c"), "a" : 2, "b" : 2 }
      { "_id" : ObjectId("5fd0a1937a7cb559d4cff62e"), "a" : 2, "b" : 4 }
      

      Result B:

      { "_id" : ObjectId("5fd0a1937a7cb559d4cff62d"), "a" : 1, "b" : 3 }
      { "_id" : ObjectId("5fd0a1937a7cb559d4cff62b"), "a" : 1, "b" : 1 }
      { "_id" : ObjectId("5fd0a1937a7cb559d4cff62c"), "a" : 2, "b" : 2 }
      { "_id" : ObjectId("5fd0a1937a7cb559d4cff62e"), "a" : 2, "b" : 4 }

      Design considerations

      However, a perfect randomization (that produces any possible permutation of results with equal probability) is not required, just “random enough” to achieve the goal. Also, if necessary, trade-offs should be made to favour minimization of resource usage overhead of the feature instead of perfect randomization.

      This feature could have a toggle for turning it off at the server/query level.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            mindaugas.malinauskas@mongodb.com Mindaugas Malinauskas
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: