[SERVER-3705] mongos not closing connections, filling up disk with logs Created: 29/Aug/11 Updated: 11/Jul/16 Resolved: 02/Sep/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 1.8.2 |
| Fix Version/s: | 2.0.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Theo Hultberg | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
One of our mongos started spewing out this message:
it wrote 6.5 gigabytes of identical lines (same timestamp and all), which filled up the root partition of the server. All queries running through that mongos started failing at the same time, which led to the application failing. However, mongos is still running, and holding 300 TCP connections open (according to lsof). From what I can see it started about an hour earlier with mongos not being able to connect to the cluster, which goes on until it can connect to all the nodes except one, and then just minutes before it starts spewing out the messages about too many open files it manages to connect to the last one too. Then it writes the same message until the disk runs out. |
| Comments |
| Comment by auto [ 29/Aug/11 ] |
|
Author: {u'login': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}Message: Wait a bit before trying to accept() when out of FDs |
| Comment by Eliot Horowitz (Inactive) [ 29/Aug/11 ] |
|
mongos has connection pools to shards and the config servers, so even when idle with no incoming connections, there will still be outgoing connections. If you're seeing entries like that it means queries are happening. |
| Comment by David Tollmyr [ 29/Aug/11 ] |
|
Since this in a new cluster we're breaking in we shut down the application completely. Mongos is completely idle but we still see 3-400 open connections from mongos to 27017 and config servers (3 shards and 3 config servers). It's been more than 15 minutes with no activity and the open connections have not changed. We also see many entries like this: |
| Comment by Eliot Horowitz (Inactive) [ 29/Aug/11 ] |
|
50 is the number of incoming connections, the total number of tcp connections needed is more than that since we have to connect to every shard. |
| Comment by Theo Hultberg [ 29/Aug/11 ] |
|
We run one application against the mongos, that application uses the Ruby driver and has a connection pool limit of 50. It seems unlikely that raising the limit to several thousand would do anything but postpone the inevitable. Then there's the issue of it writing gigabytes after gigabytes of logs, which makes it look even more like a bug in mongos. Since reporting this the error has happened again, and this time two mongos ran into the bug at more or less the same time. |
| Comment by Eliot Horowitz (Inactive) [ 29/Aug/11 ] |
|
Is there any reason you think its not closing rather than just not having enough file descriptors? |