Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3165

Server locks up on moveChunk

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.8.2
    • Component/s: Performance, Sharding
    • Labels:
      None
    • Environment:
      2 shards 3 replicas each. 7 mongos clients each having 8 clients connecting to mongos. All running on ubuntu 10.04.
    • Linux

      We noticed severe lockups randomly on our servers and after investigating the issue for few days, we noticed that this happened everytime balancer started moving chunks around.

      After moveChunk request is accepted, server only receives connections and queries but they seem to never complete. After 15 seconds, our servers ran out of 20000 connections (servers have ulimit -n 50000) because queries pool up, however mongo shell goes unresponsive on primary shard long before that. Some times moveChunks do go through just fine, but because of our heavy query amounts, I believe we are querying the chunk that is currently being moved when lockup happens.

      After killing the primary, secondary takes over and enviroment is again stable until balancer kicks in again.

      I've included log from primary shard server (EU) running with -vvvvv illustrating the issue.

      Disabling balancer stopped the server lockups so we are running without balancer until this is resolved.

      We are not sure if this happened same way when we were using 1.8.1 though.

            Assignee:
            Unassigned Unassigned
            Reporter:
            jalava Jalmari Raippalinna
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: