[SERVER-42343] WiredTigerLAS.wt grows when lagged node is in maintenance mode Created: 23/Jul/19  Updated: 29/Oct/23  Resolved: 14/Aug/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.20
Fix Version/s: 3.4.23

Type: Bug Priority: Major - P3
Reporter: David Bartley Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2019-07-24 at 10.05.39 AM.png    
Issue Links:
Related
is related to SERVER-30638 Change setReadFromMajorityCommittedSn... Closed
is related to SERVER-19209 Need to drop all storage snapshots on... Closed
is related to SERVER-18022 Support "read committed" isolation le... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2019-07-29, Execution Team 2019-08-12, Execution Team 2019-08-26
Participants:

 Description   

As part of rolling index builds and similar operations, we'll often take nodes offline for O(hours). When the node comes back online, we'll put it into maintenance mode (using replSetMaintenance) until the node catches back up. We've observed that such nodes often end up with a huge WiredTigerLAS.wt file. We've done some investigation and it seems to be the case that this only happens when we put the node into maintenance mode; if we simply let a lagged node stay in secondary mode, we don't see WiredTigerLAS.wt grow.

We suspect this is related to cache pressure of majority read concern since we only started seeing these issues when we enabled that.



 Comments   
Comment by Danny Hatcher (Inactive) [ 22/Aug/19 ]

I've confirmed that my repro does not grow the WiredTigerLAS.wt file while running with this commit.

Comment by David Bartley [ 14/Aug/19 ]

Thanks for fixing this so quickly!

Comment by Githook User [ 14/Aug/19 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-42343 drop all snapshots on transition from a readable state to a non-readable one

This avoids accumulating unnecessary historical information in the storage engine while
we are in a non-readable state.
Branch: v3.4
https://github.com/mongodb/mongo/commit/7fbb8c5c17f457b8c28ecfa03b4083ce9fe0e8d9

Comment by Eric Milkie [ 24/Jul/19 ]

My guess is that the bug is due to this code in the snapshot thread:

            if (!replCoord->getMemberState().readable()) {
                // If our MemberState isn't readable, we may not be in a consistent state so don't                                                                                                                                                                                 
                // take snapshots. When we transition into a readable state from a non-readable                                                                                                                                                                                    
                // state, a snapshot is forced to ensure we don't miss the latest write. This must                                                                                                                                                                                 
                // be checked each time we acquire the global IS lock since that prevents the node                                                                                                                                                                                 
                // from transitioning to a !readable() state from a readable() one in the cases                                                                                                                                                                                    
                // where we shouldn't be creating a snapshot.                                                                                                                                                                                                                      
                continue;
            }

So it's probably fine logic to do this for times when the node is in RECOVERING state (the maintenance mode command sets this state), but we also need to delete all active snapshots when this occurs, to unpin the pages involved in those snapshots.

Comment by David Bartley [ 23/Jul/19 ]

Yup, we've been SIGKILLing the node (SIGTERM works too but sometimes takes many minutes).

Comment by Danny Hatcher (Inactive) [ 23/Jul/19 ]

Hello bartle,

I've confirmed what you suspected in my own environment. On 3.4.20 with Read Concern "Majority" enabled a node placed into RECOVERING via

db.adminCommand({replSetMaintenance:true})

will see a rise in the WiredTigerLAS.wt file size (given writes are happening on the Primary). I have also confirmed that this does not happen on 3.6.0. I imagine this was one of the many items fixed by the changes we made to Read Concern "Majority" in 3.6.

If you restart the node, the file should reset. I understand that this isn't really optimal and I will see if it will be possible to do a backport.

Generated at Thu Feb 08 05:00:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.