-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Integration
-
None
-
None
-
None
-
None
-
None
-
None
-
None
If you run a query like this
db.diskio.distinct("metafield");
It will unpack every bucket in the collection. This can take a looong time. Uh oh!
In contrast if you do this
db.diskio.aggregate([{ $group: { _id: "$metafield" } }]);
It can be very fast because our optimizer is smart enough to not unpack any buckets at all.
The difference between the latency and CPU usage of these two queries is enormous, even though they're doing the same thing. We should be able to optimize this.
As motivation, this kind of query is super useful for creating a UI to interact with time series data. It lets a UI implementer answer the question: "what are the valid metafield values for this collection?" These are queries that ideally would be very fast.