Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79629

Consider avoiding copying valueBlock at BlockToRowStage::prepareDeblock

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 7.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Integration
    • Fully Compatible
    • 35
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Irina's comment:
      I'm concerned about what it means for the queries without the intermediate block processing (or before we can enable it), for example a $group w/o a prior filter: we'll be copying the whole original blocks all the time, no?

      Ian's comment:
      IMO, we are leaning in the direction of copying slightly too much. Here's what I would do: -We get rid of the _blocks member -On getNext(), we call getValues() on the new block that the child stage placed in the slot. We store this in _deblockedValueRuns. Note that these values are views! But these views are valid until we call getNext() on the child or yield. We don't have to worry about calling getNext() on the child, since by then we'll be done with everything in _deblockedValueRuns, so the other situation we have to think about is yielding... -On doSaveState(), we call copyValue() on everything in _deblockedValueRuns[_curIdx...end]. We should probably also reset _deblockedValueRuns[0..._curIdx] to Nothing so that no one reads it by accident. -On doRestoreState(), we do nothing.

      This way, there is no need to clone the block, and the copy is only done for the values we haven't returned yet when there is a yield. Since yielding is rare, we will only copy a small fraction of the data.

      Yoonsoo's reply:
      "for now", I'd like to go with the current design to simplify the ownership model.

      If I follow the Ian's proposal, I need to maintain whether each deblocked value is owned or not. That complicates the ownership model a bit more. At first, it's merely a view and after yielding, it owns values? Either being a view or a value owner will be easier to understand, I think. For now, we agreed on simplifying ownership model for TsBlock, always copying values when deblocking, right? There we simplify the ownership model, here we complicates the ownership model? That does not make much sense to me for now. Instead, let's measure perf and improve the design/code based on perf results. Are you concerned that the improvement will be too hard to achieve? If so, could you explain a little bit how that would be the case?

      FYI: valueBlock->clone() will copy underlying buffer only, not deblocked values for bucket unpacking scenario. And I imagined that the lock yielding is much more frequent than we may think because each bucket may have up to 1000 measurements and we also yield lock per each 1000 iteration. So, copying deblocked values may be much more frequent than we currently think. Which one between copying the underlying buffer and copying deblocked values partially is more frequent and more expensive? This was my thought process for the current design.

            Assignee:
            ian.boros@mongodb.com Ian Boros
            Reporter:
            yoonsoo.kim@mongodb.com Yoon Soo Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: