[SERVER-68353] Support count option in $collStats for time series collections Created: 27/Jul/22  Updated: 25/Oct/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Edwin Zhou Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 3
Labels: time-series
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Storage Execution
Participants:
Case:

 Description   

We do not support the count option in the $collStats aggregation stage in time series collections since they are considered views. Therefore, db.timeseries_collection.stats().count does not exist, and applications which rely on db.collection.stats().count will have an undefined behavior for time series collections. While we do support a form of count through db.timeseries_collection.stats().timeseries.numMeasurementsCommitted, this field resets to 0 whenever the server restarts and is therefore an unreliable measurement for counting the number of documents in a time series collection.

Time series collections are considered as views and do not support some $collStats options. However, since time series collections are intended to behave similarly to normal collections, we should support some $collStats options which are not supported in views.



 Comments   
Comment by Erwin Segerer [ 15/Aug/22 ]

As mentioned in my support case we would expect to see the same count for regular and time-series collections in stats(). Which is the "same number" as returned by countDocuments(). The internals of a "time-series collection" are hidden by the view. Actually the view is the time-series collection from our perspective. It lists each document and all operations are performed on it.

I understand that there are technical limitations why you currently do not have count in stats() for time-series "collections". By doing so the effort is now in our application (the client) to provide the same behavior no matter if it is a regular or time-series collection. This makes time-series useful only for small data sets.

Bosch IoT Insights is for our customer about working with data. For them each individual measurement is a document. We present them the count from stats() to get an overview on their data load. There are many dashboards and views where the count from stats() is used. Thus without stats() performance is degrading with the number of measurements (since response time of countDocuments() increases).

Please make stats() available for time-series collections,

Regards, Erwin

Generated at Thu Feb 08 06:10:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.