$topN operator should leverage index-only execution for sorting and limiting before fetch

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The $topN operator currently can recognize and leverage index coverage when output includes indexed fields. In the following example, we have index class_1_entityId_1_legacyId_1 and it never fetches documents after the match stage.

      [
         {
            "$match":{
               "class":"hotel",
               "entityId":{
                  "$in":[
                     <...100 entity ids...>
                  ]
               }
            }
         },
         {
            "$group":{
               "_id":"$entityId",
               "images":{
                  "$topN":{
                     "output": ["$entityId", "$legacyId"],
                     "sortBy":{
                        "legacyId":1
                     },
                     "n":20
                  }
               }
            }
         }
      ]

      However when using $topN with output:"$$ROOT", it fetches all the documents after the match stage. Ideally, similar to how $topN can leverage index coverage when projecting only indexed fields in the output, it should also be able to sort and limit directly from the index when the sortBy field is indexed, fetching only the top N documents from disk after sorting. 

       

      [  
        {  
          $match: {  
            class: "hotel",  
            entityId: { $in: [<...100 entity ids...>] }  
          }  
        },  
        {  
          $group: {  
            _id: "$entityId",  
            images: {  
              $topN: {  
                output: "$$ROOT",  
                sortBy: { legacyId: 1 },  
                n: 20  
              }  
            }  
          }  
        }  
      ]  
       

      This could significantly reduce I/O operations and improve query performance when dealing with large collections where only the top N documents per group are needed.

            Assignee:
            Unassigned
            Reporter:
            Saurabh Jain
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: