Details
Description
The bucket catalog serves two main purposes:
- Allow efficient discovery of buckets that are not full for given meta-data and time.
- Synchronize and batch concurrent updates to the same bucket.
Proposed implementation details:
- The global bucket catalog has an in-memory thread-safe ordered map indexed by a tuple <nss, metadata, _id>. For each bucket it contains:
- A vector of measurements to be inserted.
- The data size of the bucket, which is the total BSON size of the data object of the BSON serialization of the bucket, including measurements to be inserted.
- The total number of measurements in the bucket, including uncommitted measurements and measurements to be inserted.
- The number of committed measurements in the bucket.
- The number of current writers.
- A set containing all new top level field names of the measurements to be inserted.
- The set of top level field names of the measurements that have been inserted into the bucket.
- Most recent commit info, such as the timestamp, cluster time, etc. required for the update return.
- The catalog also has an "idle bucket" queue with references to all buckets that do not have writers. This queue allows expiring entries in the bucket catalog if their total size exceeds some (big) threshold. On step-down this queue is flushed, so the bucket catalog is empty.
Attachments
Issue Links
- is related to
-
SERVER-53072 Expire entries in the bucket catalog if the total size threshold is exceeded
-
- Closed
-
- related to
-
SERVER-52522 transform inserts in a time-series collection into upserts on buckets
-
- Closed
-