[SERVER-35191] Stuck with cache full during rollback Created: 23/May/18  Updated: 27/Oct/23  Resolved: 02/Oct/23

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: 3.6.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Replication Team
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File FTDC.png    
Issue Links:
Depends
depends on WT-4144 Fix rollback_to_stable with lookaside... Closed
Duplicate
duplicates SERVER-34941 Add testing to cover cases where time... Closed
Related
related to SERVER-36879 write regression test for stuck cache... Closed
is related to SERVER-34938 Secondary slowdown or hang due to con... Closed
is related to SERVER-34941 Add testing to cover cases where time... Closed
is related to SERVER-36495 Cache pressure issues during recovery... Closed
Assigned Teams:
Replication
Operating System: ALL
Sprint: Storage NYC 2018-09-10
Participants:

 Description   

During rollback we don't advance the oldest timestamp. This can pin a lot of data in the cache and we can get stuck with the cache full.



 Comments   
Comment by Lingzhi Deng [ 02/Oct/23 ]

This problem goes away with WT history store

Comment by Ian Whalen (Inactive) [ 21/Aug/18 ]

milkie believes that the important feature of triggering this issue is the stable timestamp lagging behind the majority commit point. Some combination of the mechanism in wt_unclean_shutdown.js to control checkpoints and RollbackTest to force operations to get rolled back in a 2-node + arbiter replica set should do the trick.

Comment by Ian Whalen (Inactive) [ 21/Aug/18 ]

Reopening this so that Benety can address by adding a new test in case rollback and startup recovery have different behavior after prepare support is completed. Presumably no code changes.

Comment by Spencer Brody (Inactive) [ 23/May/18 ]

This is likely effectively a duplicate of SERVER-34941 since startup recovery and rollback recovery share much of the same code and logic.

Generated at Thu Feb 08 04:39:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.