[SERVER-81114] compact_while_creating_indexes fails due to cache eviction pressure Created: 15/Sep/23  Updated: 29/Jan/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matt Broadstone Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: former-storex-namer, storex-ranked
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-81007 FSM workloads no longer fail when $co... Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Backport Requested:
v7.3
Sprint: Execution Team 2023-11-27
Participants:

 Description   

After fixing SERVER-81007, this test began consistently failing. It's unclear what change led to the failures, but it seems the conversion to modules is not likely the culprit. Here is a successful test run from June 15th before the bug was introduced in SERVER-81007. Maybe there was some change in storex or WiredTiger code after the bug was introduced which leads to this consistent failure.



 Comments   
Comment by Etienne Petrel [ 27/Nov/23 ]

I take back what I said in my previous comment. When compaction is blocked by checkpoint, compaction is retried internally. However, EBUSY can still be returned for other reasons. If EBUSY is actually because of cache eviction pressure, WiredTiger would print it out.

Comment by Etienne Petrel [ 19/Sep/23 ]

The error message is Compaction interrupted on table:collection-1906--5767142732412023233 due to cache eviction pressure which comes from the storage layer:

Status WiredTigerIndexUtil::compact(OperationContext* opCtx, const std::string& uri) {
...
        if (ret == EBUSY) {
            return Status(ErrorCodes::Interrupted,
                          str::stream() << "Compaction interrupted on " << uri.c_str()
                                        << " due to cache eviction pressure");
        }

Or

Status WiredTigerRecordStore::doCompact(OperationContext* opCtx) {
...
        if (ret == EBUSY) {
            return Status(ErrorCodes::Interrupted,
                          str::stream() << "Compaction interrupted on " << getURI().c_str()
                                        << " due to cache eviction pressure");
        }

This surprises me, EBUSY does not necessarily mean the cache is full. For example, compaction performs checkpoints that can return EBUSY for various reasons.

Generated at Thu Feb 08 06:45:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.