In NearStage, we generate serval sub-stages to fetch and sort the data progressively. Every sub-stage is a fetch stage which has a index scan stage as a child. On top of NearStage, we use another top level fetch stage to filter the data if necessary, so it turns out that the "totalDocsExamined" is counted twice for the returned documents, one in the sub fetch stage and one in the top level fetch stage.
Ideally, "totalDocsExamined" should indicate the number of documents we fetched from disk and "alreadyHasObj" indicates the number of documents already in memory. Currently, "totalDocsExamined" includes "alreadyHasObj", which makes the "totalDocsExamined" greater than expected for geo near.
Another approach is to expose the total number of objects already in memory at the top level to make it explicit that totalDocsExamined is not a metric of disk fetch. Or instead, expose the number of documents fetched from disk directly.
- related to
-
DOCS-11277 Consider slight clarification of `explain.executionStats.totalDocsExamined`
- Closed