Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4710

Writers queues never go down and server eventually stalls

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Incomplete
    • Icon: Major - P3 Major - P3
    • None
    • 2.0.2
    • Stability
    • None
    • Windows x64
    • Windows

    Description

      We've posted on mongo-users what we believe is our core issue right now, namely the queued writers which never seem to go down once they've reached a threshold.

      The thread is here: http://groups.google.com/group/mongodb-user/browse_thread/thread/195264a598d87393#

      We've been able to reproduce the problem yesterday. After a --repair, the server seemed fine for a while, and when copying on the server the log file (the -v switch proved itself very verbose indeed, we had over 2GB of logs for a few hours of activity), the qw figure shot up drastically, as did the inbound connections (the clients were trying to compensate for the blocking queries), right before the server stopped serving queries altogether.

      I've uploaded on jira part of the logs for this episode...

      Later on, when we issued a repair on the database, the log was

      Tue Jan 17 00:09:45 [initandlisten] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: Main.cachedItems top: { opid: 220, a
      ctive: true, waitingForLock: false, secs_running: 0, op: "getmore", ns: "Main.cachedItems", query: {}, client: "0.0.0.0:0", desc: "initandli
      sten", numYields: 0 }
      Tue Jan 17 00:09:46 [initandlisten] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: Main.cachedItems top: { opid: 221, a
      ctive: true, waitingForLock: false, secs_running: 0, op: "getmore", ns: "Main.cachedItems", query: {}, client: "0.0.0.0:0", desc: "initandli
      sten", numYields: 0 }

      (and so on for the 223 stuck queued writers).

      So it looks like some of our queries (we're implementing a poor-man distributed lock using atomic sets and checks) may be problematic when mongod is under duress (IO starvation, or other factors). We're moving the lock mechanism to a dedicated instance, and we're working more generally on our data access profile to minimize writes wrt reads, but we need to find out the root causes for this incident.

      Attachments

        Activity

          People

            milkie@mongodb.com Eric Milkie
            kbjmongo YannPierre CouzySchwartz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: