Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56058

Change BucketUnpacker to produce measurement documents with fields in sorted order

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Query Execution
    • Labels:
      None
    • Query Execution 2021-05-17

      For regular (non time-series) collections in MongoDB, documents are generally returned to the user with the same field ordering with which they inserted. For instance, if you insert document {a: 1, b: 1} and then do a simple collection scan query, you can expect to get back {a: 1, b: 1} and not {b: 1, a: 1}. (Although the properties of a JS object are unordered, BSON is an ordered format.) This guarantee, however, does not hold for time-series collections. The reason is that when time-series data is ingested, it is internally formatted into a column-oriented format. The order of the columns in storage is arbitrary in order to permit efficient construction of these column-oriented buckets.

      In order to make the system more predictable for users of time-series, we should consider changing the BucketUnpacker to always return measurement documents with sorted fields. This would mean that although users may not get the field order they inserted into the time-series collection, they will at least get a predictable field order. This can also be done with negligible cost, because sorting the columns names only needs to be done once for each bucket. The sorting cost is amortized over the number of measurements in the bucket.

      Finally, materializing measurement documents in sorted order would be more consistent with an imagined future in which MQL evolves to become insensitive to field order. At the moment, MQL does sometimes consider field order significant, but the field ordering semantics are weak and poorly understood. We may try to change the language such that queries operate logically over unordered documents (even if the BSON storage and wire format is technically ordered). One way to implement this is to ensure that documents are always stored in sorted order and that intermediate documents materialized during query execution are also constructed in sorted order. This invariant would ensure that document comparisons effectively use unordered field semantics without having to actually sort the fields of objects at query runtime.

      Any broader changes to MQL semantics is more of a future concern, however, so the primary motivation of this ticket is about providing users with a predictable field order, which may improve the user experience for either writing queries or viewing query results.

            Assignee:
            irina.yatsenko@mongodb.com Irina Yatsenko (Inactive)
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: