[SERVER-56315] "[ftdc] serverStatus was very slow" due to which mongo daemon stopping abruptly & becomes stale Created: 23/Apr/21 Updated: 22/Jun/22 Resolved: 16/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.0.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bharath Kumar CM | Assignee: | Dmitry Agranat |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | perfomance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | since ftdc data is not human readable how do we interpret this issue and what's the solution to this. I've many servers where I see this issue and causing many issues with replication between nodes. |
| Participants: |
| Description |
|
since ftdc data is not human readable how do we interpret this issue and what's the solution to this. I've many servers where I see this issue and causing many issues with replication between nodes.
Error log: 2021-04-21T04:56:39.729+0000 I COMMAND [ftdc] serverStatus was very slow: { after basic: 2165, after asserts: 3087, after backgroundFlushing: 8785, after connections: 11612, after dur: 12922, after extra_info: 19620, after globalLock: 27972, after locks: 41890, after logicalSessionRecordCache: 51379, after network: 55947, after opLatencies: 62013, after opcounters: 67473, after opcountersRepl: 70586, after repl: 74168, after security: 78481, after storageEngine: 78629, after tcmalloc: 78629, after transactions: 78629, after transportSecurity: 78629, after wiredTiger: 78813, at end: 79004 } |
| Comments |
| Comment by Dmitry Agranat [ 16/May/21 ] |
|
Hi, We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. Regards, |
| Comment by Dmitry Agranat [ 29/Apr/21 ] |
|
bharath_achar@outlook.com, you can save aside mongod logs for now and we'll get to redacting them later on if needed. As mentioned earlier, we can try investigating the issue based on the diagnostic.data (the contents are described here), w/o the mongod logs. Please let us know when these are uploaded from all members of the replica set. Please mention the exact timestamp and timezone of the event you'd like us to focus on. Dima |
| Comment by Bharath Kumar CM [ 26/Apr/21 ] |
|
Can you please share me the command to export bsondump file using jq ? Once I've the exported data, I will just change company specific information and share the log to you so that we both are safe zone to further work on this issue. |
| Comment by Dmitry Agranat [ 26/Apr/21 ] |
|
Hi bharath_achar@outlook.com, we can try investigating based on the diagnostic.data (the contents are described here), w/o the mongod logs. Please let us know when these are uploaded from all members of the replica set. Please mention the exact timestamp and timezone of the event when the mongod process becomes stale. |
| Comment by Bharath Kumar CM [ 26/Apr/21 ] |
|
@Dmitry Agranat I'm afraid I cannot share the logs as per company policy I agree that you take some measure to secure the data and delete once done but from my end I cannot share it. But if you could share steps isolating this issue would be of great help.
Regards, Bharath Achar |
| Comment by Dmitry Agranat [ 26/Apr/21 ] |
|
For each member of the replica set, please archive (tar or zip) the mongod.log files covering the incident and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Please mention the exact timestamp and timezone of the event you'd like us to focus on. Dima |
| Comment by Bharath Kumar CM [ 24/Apr/21 ] |
|
Hi @Edwin Zhou are you aware of this issue ? how to troubleshoot further ? Enabling verbose 5 logs does gives more information ? Is it truly a memory issue ? what's the next step to avoid this issue ? |