Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4649

mongod primary locks up and never kills cursors stalling connections for ever

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 2.0.1
    • Component/s: None
    • Labels:
    • Environment:
      Debian Squeeze with official 10gen deb packages.
      Two servers running on a replicaset.

      Each server is also running numactl --interleave=all wrapper.

      Mixed clients but mainly used by the PHP driver.
    • ALL

      At what seems to be random times, the mongod primary never seems to finish some queries.
      Some cursors had been running for 6+ hours. It has only happened on the primary.

      The log always has these entries when this occurs:

      [conn123812] warning: virtual size (129449MB) - mapped size (123342MB) is large. could indicate a memory leak

      [initandlisten] connection accepted from 172.16.49.98:40540 #123818
      [conn123816] end connection 172.16.49.98:40539
      [conn123817] Assertion failure cc->_pinValue < 100 db/clientcursor.h 309
      [conn123817] killcursors exception: assertion db/clientcursor.h:309 1ms

      And as we can see in the log, the queries that work seem to take forever:
      [conn123170] query x.scores nscanned:1015 scanAndOrder:1 nreturned:306 reslen:12260 302422ms

      The server resources doesn't seem to be exhausted and restarting the server gets everything back to normal again.

        1. currentop.log
          998 kB
        2. oplog.log
          2.44 MB
        3. pinvalue.txt
          30 kB
        4. mms.gif
          mms.gif
          65 kB
        5. mms2.gif
          mms2.gif
          131 kB
        6. oplog - start to crash.log
          1.89 MB

            Assignee:
            kristina Kristina Chodorow (Inactive)
            Reporter:
            balboah Johnny Boy
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: