[SERVER-43223] checkpoint took too long time Created: 08/Sep/19  Updated: 10/Sep/19  Resolved: 10/Sep/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.2.6, 3.6.3
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: hao shan Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

1.single node(no rs)
2.16core,128G mem
3.suse11
4.data under /mnt/mongodb 533G


Attachments: Text File serverstatus.txt    
Participants:

 Description   

Our mongo runs everything OK before Sep 6th.(This time it's 3.2.6 version)

We planed a backup operation in Sep 6th night using disk sync command. To ensure totally sync, we stop mongo and then do sync disk times. After that, we start mongo.

Since this restart, mongo often hangs. Nearly 5~10mins one time and then everything gonna ok for 2~3mins and then again.

Through mongostat, we found everytime when mongo going to hanging, there is a "flush".

After some docs reading, we found checkpoint time is too long. When mongo going to checkpoint, all request hangs until it finished.

During the checkpoint process, there is no iops in system.

We found a similar case below:

https://dba.stackexchange.com/questions/182542/mongodb-responds-slowly-during-the-wiredtiger-checkpoint-writing-process

and it mentioned issue:

https://jira.mongodb.org/browse/WT-3362

so we update mongo to 3.6.3, things seem not that bad now. Checkpoint spend 180 second avg, and most important is during the checkpoint process, data has been cached can be read normally(when we under 3.2.6, everything stuck during the checkpoint process).But it still affect a lot.

Can anyone help tell why restart mongo turn out such situation

Regard



 Comments   
Comment by Danny Hatcher (Inactive) [ 10/Sep/19 ]

When restarting a server, the cache needs to fill up again with your working set. This can cause slowness until the WiredTiger cache reaches its stable state of 80% full with the normal working set. The checkpoints consist of dirty (updated) data being written to disk so if you have performed a lot of writes it will take longer than if you were only performing reads. As you've experienced, we improved performance greatly by the 3.6 release. I strongly recommend upgrading from 3.6.3 to one of 3.6.14/4.0.12/4.2.0 as those releases will have the most improvements.

The SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Generated at Thu Feb 08 05:02:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.