Investigate why running concurrent FCV upgrade/downgrade together with retryableWrites on a timeseries collection causes more buckets to be created than expected for a fixed write workload.
Background
We have a timeseries write test that inserts a fixed number of measurements into a single timeseries collection, then asserts on collStats timeseries metrics such as bucketCount.
In two baseline configurations the test behaves as expected: when we enable concurrent FCV upgrade/downgrade but keep retryableWrites disabled, the test consistently sees bucketCount == 2 and passes. Likewise, when we enable retryableWrites but do not run FCV upgrade/downgrade in the background, we again see bucketCount == 2 and the test passes.
The issue appears only when both features are enabled at the same time. With concurrent FCV upgrade/downgrade and retryableWrites turned on, the same write workload produces bucketCount == 3 instead of 2. Other bucket metrics reflect this change (e.g., numBucketInserts and numBucketsOpenedDueToMetadata increase, while numBucketsClosedDueToCount becomes 0), but numCommits and numMeasurementsCommitted remain unchanged, indicating that only the bucket layout has changed, not the logical amount of data written.
In the failing configuration, featureFlagCreateViewlessTimeseriesCollections is disabled, so FCV upgrade/downgrade is not expected to run viewful↔viewless timeseries conversion. The unexpected extra bucket is therefore coming from another part of the FCV logic that interacts with timeseries writes.
In SERVER-119926 we made so that if we are running in FCV up/down suite with retryable writes we relax assertions on the buckets count.
To reproduce this issue search the reference to this ticket in the code.
Investigation result
There are 3 causes that explain why FCV suites can cause extra bucket creations of re-openings:
| Case | Requirements | Symptom | Root cause | Main ticket |
|---|---|---|---|---|
| 1.1 | FCV suite with retryable writes | A bucket which is already open is closed, and instead another bucket is re-opened or created |
|
|
| 1.2 | Sharding + viewless timeseries upgrade/downgrade | A bucket which is already open is closed, and instead another bucket is re-opened or created |
|
Also Investigated as part of |
| 2.1 | Viewless timeseries upgrade/downgrade | There is a bucket that could be re-opened, but a new bucket is created |
|
In all three cases there is no correctness issue: The timeseries write path will retry but the buckets don't generated in a canonical (optimally packed) way.
Note cases (1.1) and (1.2) go through the same retry path as SERVER-89349.
This is not a major concern during FCV upgrade/downgrade, which is a low frequency event in production clusters.
- is related to
-
SERVER-89349 Resharding a timeseries during insertion does not leave the bucket collection optimally compressed
-
- Closed
-
-
SERVER-119926 Make timeseries tests relying on buckets count resilient to FCV upgrade/downgrade plus retryable writes
-
- Closed
-
-
SERVER-123916 Exclude timeseries_lastpoint_common_sort_key.js from suites with non-canonical bucketing
-
- Closed
-
-
SERVER-122949 Writes concurrent to viewless timeseries upgrade may create new buckets instead of reopening
-
- Closed
-