[SERVER-11119] mongod connections hang because of assertion failure Created: 10/Oct/13 Updated: 10/Jun/14 Resolved: 10/Jun/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andreas Heck | Assignee: | Victor Hooi |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | heartbeat | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 12.04 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | Happens randomly but seems to be more likely when you do repairDatabase or fsync lock in one of the OTHER nodes of a replica set. |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
I got into a situation where all new connections with the mongo shell hang and never reach the prompt. The log says that an assertion failed but the mongod process does not terminate so that this problem prevails until I manually restart mongod. Log messages:
|
| Comments |
| Comment by Thomas Rueckstiess [ 10/Jun/14 ] | |||||||||||
|
Hi Andreas, Apologies that it has taken us so long to get back on this ticket. After further inspection of the log files, I believe the issue you're seeing is a combination of two known issues: I found that the server at 10.36.* was fsyncLocked twice in a row without unlock in between, which can exhibit unexpected behavior and lead to a node that remains locked (see
The following replica set reconfiguration would have blocked on the thread holding the fsync lock.
We can also see the stack traces and warning shortly after. You were then unable to connect to the node, because the authentication requests were also blocked behind the write ( The repair on 10.210.* may have been unrelated. As we are already tracking both of these issues, I'm going to close this one as duplicate. Regards, | |||||||||||
| Comment by Andreas Heck [ 10/Oct/13 ] | |||||||||||
|
Logs from all three MongoDB nodes Yes I removed and readded the other secondary a few minutes before the problem occured | |||||||||||
| Comment by Scott Hernandez (Inactive) [ 10/Oct/13 ] | |||||||||||
|
Can you please attach the full logs from all members at this time? The snippet you have included is just a warning not an error; it pertains to internal replication heartbeats and by itself doesn't show any problem. Were you doing a replication reconfigure during this time as well? |