WT-7787 made a change to give up checkpoint cleanup if the cache gets into aggressive mode. The idea was to postpone cleanup to the next checkpoint, as eviction has begun to struggle.
In a customer case, we saw a similar issue that we tried to resolve in
WT-7787, but the cache was not set to aggressive because eviction system thinks there is not much to evict and it is doing a good job of evicting what it can find. This when the application was actually stalled because dirty was continuously 20%+.
I think we should refine the eviction condition the changes from
WT-7787 are based on, as looking for an aggressive set might not be sufficient.
This ticket should relook at the customer issue, understand it deeper and study if there is more to it than I have mentioned, even though checkpoint cleanup and touching 20% dirty seem the root cause to me. The outcome should be understanding what eviction was up to, why it is unable to bring down dirty content, and if we can refine changes from
WT-7787 to mitigate the issue.