[SERVER-52655] Mongo thread hangs intermittently. Created: 06/Nov/20 Updated: 02/Dec/20 Resolved: 02/Dec/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.6.16 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nitesh Vaidyanath | Assignee: | Edwin Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Hello, Mongod thread hangs on "recvmsg" system call because of this seeing very high load on replica set. I don't see any COLLSCAN in logs. When thread hangs read and write queue increases which is obvious. Not sure what is happening with this replicaset. Currently running 3.6.16 mongod on aws i3.16xlarge instance type. PRIMARY is failing over all the time whenever all the threads hangs.
|
| Comments |
| Comment by Edwin Zhou [ 02/Dec/20 ] | |||||
|
Hi Nitesh, Glad to hear it's been resolved. I'll close this ticket now as requested.
Best, Edwin | |||||
| Comment by Nitesh Vaidyanath [ 26/Nov/20 ] | |||||
|
Hi Edwin, feel free to close the Jira case. Issue was with one of our clients, after restarting client service issue got resolved. Thanks for your help and quick response. | |||||
| Comment by Edwin Zhou [ 24/Nov/20 ] | |||||
|
Hi nvaidyanath@paloaltonetworks.com, We still need additional information to diagnose the problem. If this is still an issue for you, would you please collect perf, logs, and diagnostic.data with the timestamps and attach it to this ticket? Thanks, | |||||
| Comment by Edwin Zhou [ 10/Nov/20 ] | |||||
|
Hi nvaidyanath@paloaltonetworks.com, Thanks for your report and for providing the gdb, ftdc and screenshots detailing the events. After some investigation, we were unable to pinpoint an exact reason why you're witnessing performance issues. We were unable to make any concrete correlations as the screenshots provided are missing a timezone. Could you provide a detailed timeline of events when the queuing occurs, when the hanging occurs, when the node fails over, and when the stack traces were collected? While the mongod is running and your issues are occurring, would you be able to collect perf during the incidents you described? Please make sure an exact timestamp is included. If it's not, you can run perf with the --start option.
We will also want the diagnostic.data, logs, and the perf.txt Best, Edwin |