Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-1633

too many open files, lsof "can't identify protocol"

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 1.6.0
    • Component/s: Sharding, Stability
    • None
    • Environment:
      we use mongodb with sharding and the java-connector
      80 appservers (tomcat), on every appserver 1 mongos-daemon, connecting to 2 mongod-server
    • Linux

      on all of our 80 appserver, 1 mongos-Daemon is installed. Very frequently, the mongos-Daemon stops working because of "too many open files", which is also spamming the mongos-logfiles. We increased the openfiles limit to 30000 and this fixed the problem for a short time. But soon the problem was there again, and I discovered the real source. Somehow the mongos-Daemons seems to create some kind of idleing zombie-connections. When I do a

      lsof | grep 'mongos'

      This is the output:
      ......thousands of lines .....
      mongos 22843 root 8523u sock 0,5 2863975141 can't identify protocol
      mongos 22843 root 8524u sock 0,5 2863975143 can't identify protocol
      mongos 22843 root 8525u sock 0,5 2863975162 can't identify protocol
      mongos 22843 root 8526u sock 0,5 2863975219 can't identify protocol
      ......thousands of lines ......

      These zombie-connections fill up all the available filedescriptors.

      For me it looks like a bug in mongos , or the mongodb-java connector (or both).

        1. mongod.conf
          2 kB
        2. mongod.log.2010-08-25.gz
          2.05 MB
        3. mongod-2010-09-01.log.gz
          2.24 MB
        4. mongos_debug.log.gz
          17 kB
        5. netstat_grep3307.txt
          106 kB
        6. netstat_grep3309.txt
          32 kB

            Assignee:
            alerner Alberto Lerner
            Reporter:
            jonasgk Jonas
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: