Make distinct("metafield") on time series collections not require bucket unpacking

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      If you run a query like this

      db.diskio.distinct("metafield");
      

      It will unpack every bucket in the collection. This can take a looong time. Uh oh!

      In contrast if you do this

      db.diskio.aggregate([{ $group: { _id: "$metafield" } }]);
      

      It can be very fast because our optimizer is smart enough to not unpack any buckets at all.

      The difference between the latency and CPU usage of these two queries is enormous, even though they're doing the same thing. We should be able to optimize this.

      As motivation, this kind of query is super useful for creating a UI to interact with time series data. It lets a UI implementer answer the question: "what are the valid metafield values for this collection?" These are queries that ideally would be very fast.

            Assignee:
            Unassigned
            Reporter:
            Chris Wolff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: