Add control-byte-at-a-time BSONColumn decompression API for classic BucketUnpacker

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Overview

      The classic BucketUnpackerV2 iterates each field column one BSONElement at a time via BSONColumn::Iterator. The SBE block-unpack path uses BSONColumnBlockBased::decompress instead, which is much faster because it decodes a whole BSONColumn at once with tight branchless loops. We can't just swap classic over: decompress materializes a whole column up front, which for a wide-schema bucket means fully materializing all fields before emitting row #1.

      Proposal: add a lower-level API on BSONColumn that decodes one control byte at a time (one literal or one simple8b block). The classic unpacker keeps a small per-column decoded buffer, refills it one control byte at a time when drained, and emits row-major. Falls back to the per-element iterator for object/array (interleaved) columns, same as SBE.

      Expected wins

      • Throughput: classic unpack gets the per-byte speed of block decompression without paying full-bucket materialization.
      • Bounded memory: steady-state ~K x fields x element_size (tens to low hundreds of KB) rather than O(rows x fields).
      • LIMIT-friendly: downstream $limit/match pays at most one control byte of over-decoding per column.

            Assignee:
            Unassigned
            Reporter:
            Chris Wolff
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: