|
nikita.lapkov@mongodb.com For most sorters, you're right that we fill/seal/drain. For the BoundedSorter used for a partially-streaming sort for time series collections, we have a more interleaved operation order. (It's based on a heap, and we add elements until we reach a point in the input sequence where we know it's safe to extract the minimum element from the heap and emit it, then fill again as necessary, etc.)
I should have been more clear in my original post that while the memory sharing is apparent, I haven't confirmed a specific location where it's happening, so treat that as the "most plausible conjecture" I could come up with to explain what I was seeing*. I can tell you the bug was happening seemingly exclusively with uncompressed buckets, so BucketUnpackerV1::getNext seems like a good place to dig in.
*Specifically, a document that was inserted into the sorter would have a higher memUsageForSorter() value when it came out of the sorter than when it went in. My understanding is that shouldn't happen unless new fields of the doc are accessed in between, and I don't know how that would be the case if the doc is stashed in the sorter, unless its memory is somehow shared with e.g. another document being processed by another pipeline stage.
|