[SERVER-66867] mongod server stops accepting new connections while active ones still work Created: 31/May/22 Updated: 26/Sep/22 Resolved: 26/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.14, 5.0.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Svet Penkov | Assignee: | Chris Kelly |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
We have a single mongod container running using the official docker image (we've tried both mongo:4.4.14 and mongo:5.0.8) that at some point stops accepting new connections and any connection requests just time out. We start the container via docker-compose with:
This issue occurs every 12-24 hours, but so far we've noticed that if we comment out the last two lines above
then it becomes much more rare. I've attached the stack trace collected after the server has stopped accepting any new connections. Configuring mongod to write its log to a file has the same effect of reducing the rate of failure considerably. I've attached the stack trace after the server has stopped accepting new connections.
|
| Comments |
| Comment by Chris Kelly [ 26/Sep/22 ] |
|
Hi Svet, I'm going to close this ticket for now, but we can reopen this if you come across the issue again. Christopher |
| Comment by Svet Penkov [ 03/Jul/22 ] |
|
Hi Chris! So far we haven't been able to reproduce the issue with logging to a file enabled. If we do, I'll make sure to provide the requested information in this issue.
|
| Comment by Chris Kelly [ 30/Jun/22 ] |
|
We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the requested information? Christopher |
| Comment by Chris Kelly [ 06/Jun/22 ] |
|
Hi Svet, Thanks for your report. While enabling logging increases the failure rate, it will be valuable to see what's happening leading up to the event. For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to the ticket:
Regards,
|