[SERVER-39658] Sudden CPU spike in secondary instances with no apparent cause Created: 19/Feb/19 Updated: 06/Dec/22 Resolved: 20/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Santiago Ciciliani | Assignee: | Backlog - Triage Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Assigned Teams: |
Server Triage
|
| Participants: |
| Description |
|
Out of nowhere the two secondary instances for a three node cluster with very little CPU usage spiked to 100% and remained high. Version: 2.4.14 Things we did so far:
Things we discovered: Now that the load is higher and avg response time is >100ms than usual we see some query that should be using index not using it. Tue Feb 19 12:47:27.946 [conn11958] query locator.pages query: { query: { _depth: { $gte: 1, $lte: 1 }, _site: "sample.com" }, orderby: { _id: 1 } } ntoreturn:0 ntoskip:0 nscanned:2106803 keyUpdates:0 numYields: 11 locks(micros) r:5514254 nreturned:30 reslen:130866 2951ms
Screenshots
Secondary Node 1 - The other follows the same pattern.
Primary Node (mostly idle except for nightly batch loads)
|
| Comments |
| Comment by Santiago Ciciliani [ 20/Feb/19 ] |
|
Hi Eric, thanks for your response. I am aware that MongoDB 2.4 is EoL and we are in the process of analyzing upgrade options. In the meantime, do you know if there is a workaround we could apply to correct the plan choice and get the avg load back to normal? I'm puzzled by the fact that the server restored from the snapshot is choosing the right plan considering that the snapshot was taken after the issue started happening. Thanks |
| Comment by Eric Sedor [ 20/Feb/19 ] |
|
Hi sctrilogy, We believe you've found the likely reason for the CPU use, which is a common symptom of a sudden poor plan choice. Because of how the MongoDB query planner works it is possible for index choice to change. As well, several bugs involving poor plan choice have been corrected since MongoDB 2.4. A recent major improvement was Unfortunately MongoDB 2.4 reached end of life in March of 2016 and the SERVER project is for bugs or feature suggestions for supported versions of the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group. Thank you, |
| Comment by Santiago Ciciliani [ 19/Feb/19 ] |
|
Further update. We cloned the database from AWS AMI images and we run the same query through profiler. It turns out that new-restored cluster seems to be using the index and current prod is doing a sequential scan. Any hint on how to fix this?
|