[SERVER-38125] WriteTigerLAS.wt grows unbounded Created: 14/Nov/18  Updated: 30/Nov/18  Resolved: 30/Nov/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ashu Pachauri Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 16.04


Participants:

 Description   

We have been seeing this issue quite frequently that on the primary replica of one of the replica sets in our sharded cluster, the disk usage and disk IO utilization just shoots up.

This usually happens under situation of heavy write load. Once the load goes away, the IO utilization comes down but disk storage utilization just stays up. Sometimes, the size is about 10 times the actual size of the database i.e. the size of the secondary replicas. When investigated further, we found that the WireTigerLAS.wt file had grown unbounded and never recovered even when we took away all the write load.

 

The situation has become more prominent after we upgraded from 3.4 to 3.6, but this could be just because we have more write load since then.



 Comments   
Comment by Danny Hatcher (Inactive) [ 30/Nov/18 ]

Hello Ashu,

As I have not heard back from you and there is nothing indicating a bug in the MongoDB server, I will now close this ticket.

Thank you,

Danny

Comment by Danny Hatcher (Inactive) [ 15/Nov/18 ]

Hello Ashu,

After taking a look at the diagnostics, it appears that your Primary node experienced very high cache pressure around 11-14T12:02:37 UTC which caused a cascading effect. CPU usage spiked a few minutes before to ~98% and the disk called "nvme0n1" did as well at the same time. As I do not see anything out of the ordinary occurring, it appears that the recent load on your server simply justifies more hardware resources allocated. If you upgrade the machines that you run MongoDB on, do you still see performance issues or unbounded growth in the WriteTigerLAS.wt file?

Thank you,

Danny

Comment by Ashu Pachauri [ 14/Nov/18 ]

Hi bruce.lucas I have uploaded the diagnostic data  for all 3 machines in the replica set that was most recently impacted; it's a tar.gz file with per machine directories for the diagnostic data. Out of the three machines,  mongo-22 was the primary at the time. The issue happened around 12:55 pm UTC (Nov 14, 2018) and the recovered for a few mins. After that, it became more severe around 1:04 pm. 

Comment by Bruce Lucas (Inactive) [ 14/Nov/18 ]

Hi Ashu,

Can you please archive and upload the content of the diagnostic.data directory from all members of the replica set that's experiencing the problem to this secure upload portal.

Also, can you please tell us the time (including timezone) when this issue last occurred on that replica set?

Thanks,
Bruce

Generated at Thu Feb 08 04:48:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.