[SERVER-24712] Mongod intermittently consuming all available CPU. Created: 22/Jun/16 Updated: 14/Jul/16 Resolved: 30/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Admin |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Zachary Rollyson | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
This is a new issue with a deployment created with Cloud Manager. The server was provisioned on AWS through CM's provisioning tool. It was added to a replica set as secondary so that the old primary could subsequently be shut down. I've noticed that once every ~7 seconds, mongod will consume all CPU for about ~10 seconds. I will attach logs of mongostat, iostat & db.serverStatus() during and after a spike. |
| Comments |
| Comment by Kelsey Schubert [ 13/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'd also like to mention that the behavior of findAndModify differs between versions of MongoDB and storage engines in MongoDB. In MongoDB 3.2.0, findAndModifies are retried if a WriteConflictException is encountered (
Thank you, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 30/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Thomas. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 30/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for uploading the ss.log. It confirms my previous observations - every 18 seconds a findAndModify workload is executed.
MMAPv1 and WiredTiger have very different performance characteristics especially regarding cpu. I would recommend either allocating additional resources or modifying your application layer to distribute load if the cpu bottleneck is a problem for you. As I mentioned before, the best place for continued discussion would be the mongodb-users group . Kind regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 30/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
P.S. Just remembered I had meant to mention that the two unaffected DBs are standalones and the spiking one is a single member replica set. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 29/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
HI Thomas, The server with the spiking CPU is on 3.2.7 with wiredTiger. The two servers that are unaffected are running 3.0.12 with mmapv1. I've attached to files you asked for as iostat3.log & ss.log.
Thanks again, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 29/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you for the additional information. Are the two unaffected servers both running with same configuration (MongoDB version, storage engine)? Since your unaffected node is running MongoDB 3.0.12 and does not have full time diagnostic capture, I have modified the previous shell script to manually collect the data relevant to this issue.
Please attach ss.log and iostat.log as well as the output of the mongo commands listed in the final step to this ticket. Thank you, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 28/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
P.S. I also just uploaded ~5min of mongostat, should that be useful. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 28/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Thomas, I've just uploaded iostat2.log from an unaffected server. This one is using MongoDB v3.0.12, however, and has no diagnostic.data. Is there something else that could be useful? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 23/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If the same application is running on other servers, would you please execute the following shell script on a server that isn't affected?
This will collect iostat data each second, and will help us to correlate the CPU numbers to events recorded in the diagnostic.data, and help us compare against the behavior we are observing on the affected server. After running this script for 15 minutes, please upload the following from the unaffected server:
Thank you, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 23/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have this running on three servers and the only one with an issue is this one set up by CM. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 23/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry for the delay, I've had a chance to examine the diagnostic.data you have provided, and I do not see anything to indicate a bug in the MongoDB server. I would suggest investigating your application layer to determine whether the behavior you are observing is originating there. It appears that every 18 seconds a findAndModify workload is executed for 13 seconds:
For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group. See also our Technical Support page for additional support resources. Kind regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 23/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Status for this has been "waiting for user input" for nearly 24 hours, anything I'm missing here? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 22/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Ramón, Just uploaded diagnostic.data.tgz. If gzip won't work for you let me know and I'll upload some other format. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 22/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you please upload the contents of the diagnostic.data directory in your dbpath? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zachary Rollyson [ 22/Jun/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
To also note, this deployment is running with automation, monitoring and backup agents. |