[SERVER-52688] db.serverStatus() hang on SECONDARY server Created: 09/Nov/20 Updated: 23/Sep/21 Resolved: 04/Dec/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.2.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Peter Volkov | Assignee: | Edwin Zhou |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | I don't know. We just found server in such state 3 days ago. |
||||||||
| Participants: | |||||||||
| Description |
|
Whenever I run `mongo --host mongo3:27017 --eval 'db.serverStatus()'` this command hangs for a very long time (never managed to get reply). Output of the following command in attachment (gdb_2020-11-09_15-58-04.txt.xz): gdb
This is Gentoo Linux with 4.2.10 mongodb installed. Another observation. I've tried to stop mongodb and it hanged with the following in logs: 2020-11-09T16:32:51.302+0300 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
I've run gdb command (mentioned above) again and output is in gdb_2020-11-09_16-34-39.txt.xz. After I've killed server with -9 and started again db.serverStatus() started to work.
|
| Comments |
| Comment by Peter Volkov [ 23/Sep/21 ] | |||||||||||||||||||||||||||||||||||||
|
Ah, I see, there is no way to reopen this issue, I've opened | |||||||||||||||||||||||||||||||||||||
| Comment by Peter Volkov [ 23/Sep/21 ] | |||||||||||||||||||||||||||||||||||||
|
It happened again. According to monitoring, this happened around 2021-09-22 12:31:35. New backtrace, gdb_202109-23_11-07-08.txt.xz | |||||||||||||||||||||||||||||||||||||
| Comment by Edwin Zhou [ 04/Dec/20 ] | |||||||||||||||||||||||||||||||||||||
|
Thank you for the update. Given that we lack adequate information to diagnose this issue, I'll close this as Incomplete. Edwin | |||||||||||||||||||||||||||||||||||||
| Comment by Peter Volkov [ 04/Dec/20 ] | |||||||||||||||||||||||||||||||||||||
|
Nope, this happened only once. Of course, it's possible that this was caused by cosmic radiation, but this happened just after the upgrade so I decided to open the report. Thank you for your investigation. If this happens again, I'll provide more information. | |||||||||||||||||||||||||||||||||||||
| Comment by Edwin Zhou [ 04/Dec/20 ] | |||||||||||||||||||||||||||||||||||||
|
We'd like to know if this issue has repeated. Would you please inform us if this is a recurring problem or if has occurred on a different machine? Thanks, | |||||||||||||||||||||||||||||||||||||
| Comment by Edwin Zhou [ 19/Nov/20 ] | |||||||||||||||||||||||||||||||||||||
|
After taking a closer look at the gdb you provided, we're unable to determine what caused all of the threads to be stuck waiting to acquire the global lock. At the moment, we don't have enough information. Has his problem ever occurred on a different machine? Is this a recurring problem? Best, Edwin | |||||||||||||||||||||||||||||||||||||
| Comment by Edwin Zhou [ 16/Nov/20 ] | |||||||||||||||||||||||||||||||||||||
|
After looking into the gdb, we were able to find stack traces that confirm the behavior for both instances of the gdb being collected. However, we still don't know what caused the issue. The first gdb (gdb_2020-11-09_15-58-04.txt) correlates to the deadlock when running db.serverStatus(). We can see that 337 threads are struggling on serverStatus.
The following gdb (gdb_2020-11-09_16-34-39.txt) is when the mongod is attempting to be killed with -15. It gets stuck on _stopDataReplication_inlock. Afterward, the server is killed using -9.
Questions:
| |||||||||||||||||||||||||||||||||||||
| Comment by Edwin Zhou [ 16/Nov/20 ] | |||||||||||||||||||||||||||||||||||||
|
Unfortunately the attached diagnostic data does not cover the issue that occurred (issue occurs 11/09, diagnostics start from 11/10), and there is no way to recover that lost data. However, we are glad to hear that the issue went away when you restarted mongod. If it occurs again, we'd love for you to please again attach the archived diagnostic.data, gdb, and logs so we can further investigate the hang on db.serverStatus(). Best Edwin | |||||||||||||||||||||||||||||||||||||
| Comment by Peter Volkov [ 15/Nov/20 ] | |||||||||||||||||||||||||||||||||||||
|
Log file attached. As for diagnostic.data, it looks like it is too large, so I've put it here: https://yadi.sk/d/zY2ltxD8td9yrQ | |||||||||||||||||||||||||||||||||||||
| Comment by Edwin Zhou [ 13/Nov/20 ] | |||||||||||||||||||||||||||||||||||||
|
Would you please archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) and attach it to this ticket? Could you also attach the log files and attach it to this ticket? Kind regards, |