Modify $rankFusion 'scoreDetails' rank reporting for input pipelines that don't contain that document

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.2.0-rc1, 8.3.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Integration
    • Fully Compatible
    • v8.2
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      $rankFusion 'scoreDetails' 'details' sub-field array contains one entry for each input pipeline for each document in the results. Right now (released in 8.1) if a document does not appear in a given input pipeline, it will report the rank for that input pipeline as 0.

       

      Example:

      {  
          "_id" : 13,  
          "score" : 0.014925373134328358,  
          "details" : {  
              "value" : 0.014925373134328358,  
              "description" : "...",  
              "details" : [  
                  {  
                      "inputPipelineName" : "search",  
                      "rank" : 0,  
                      "weight" : 2  
                  },  
                  {  
                      "inputPipelineName" : "vector",  
                      "rank" : 7,  
                      "weight" : 1,  
                      "value" : 0.783767819404602,  
                      "details" : [ ]  
                  }  
              ]  
          }  
      } 

      In this case, this document did not appear in the input pipeline named "search", however the "rank" value is reported as 0.

       

      For context, the lower the rank, the closer to the top that pipeline appeared in that input pipeline, with the first document appearing as "rank" 1. So, this behavior is especially confusing / misleading as not only does it make it seem that a document that did not appear in an input pipeline was present, it further seems that this document was the first (higher than 1) result of that input pipeline.

       

      We decided to modify the 'scoreDetails.details' entry for a document that did not appear in an input pipeline to:

      • omit the "rank" field
      • omit the "weight" field
      • add an {"notPresentInPipeline": true} field

       

      So the above scoreDetails example should look like:

      {  
          "_id" : 13,  
          "score" : 0.014925373134328358,  
          "details" : {  
              "value" : 0.014925373134328358,  
              "description" : "...",  
              "details" : [  
                  {   "inputPipelineName" : "search",  
                      "rank": "NA"
                  },  
                  {  
                      "inputPipelineName" : "vector",  
                      "rank" : 7,  
                      "weight" : 1,  
                      "value" : 0.783767819404602,  
                      "details" : [ ]  
                  }  
              ]  
          }  
      } 

      after this change.

       

      Note that, fortunately, the 0 rank is not being weighted into the final score calculation for each document, so we are not computing the score results incorrectly. Note (1 *( 1 / (60 + 7))) = 0.0149...

       

      Note, ideally we would like to get this into 8.2, but we'll see if we have time.

      We should also confirm that the corresponding documentation gets updated.

              Assignee:
              Adithi Raghavan
              Reporter:
              Joe Shalabi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: