Major - P3
Determine whether and how WiredTiger should ensure that background tiered storage work completes before shutting down.
Tiered Storage relies on background worker threads (currently just one) to perform various tasks related to the flush_tier() operation – waiting until we know we are done writing to an object, copying it to the cloud, and then removing it from the local database directory after an optional delay period.
Some of the operations can be slow. For example we might have several GB of data waiting to be copied to S3. It is undesirable to add long delays when closing WT.
If an application closes WT, I don't see a reason why we need to wait for background flush activity to complete. We have to handle crashes during a flush, so a shutdown shouldn't need to be different.
- Does this affect any team outside of WT?
Not currently. But this issue came up as part of a performance regression in initial sync, which closes and reopens WiredTiger several times.
- How likely is it that this use case or problem will occur?
When tiered storage is deployed, this will come up regularly.
Acceptance Criteria (Definition of Done)
We need to think through the implications of not completing a flush at shutdown and confirm the assertion, above, that this is safe. Assuming it is, we need to define and implement any functionality necessary to abandon outstanding tiered storage flush activity and cleanup (if necessary) after restart.
We should have functional tests that close a WiredTiger connection during a flush_tier() call and then reopen the connection. After reopening, all data should be correct and a subsequent flush_tier() call should complete without problems.
- Documentation update
Make sure documentation of the
[Optional] Suggested Solution
As mentioned above, I believe the correct behavior is to ignore (i.e., abandon) any background flush activity when an application closes WT.