Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-5433

Stale config and unable to move chunks

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.1
    • Component/s: Sharding
    • Labels:
      None
    • Linux

      History
      Our production shard network was running 7 shards and 7 routers(it was silly to have 7, but it was a mistake) and 3 config servers.Main Router were running of file descriptors and then it was shutdown and limit was increased.But the database couldnt get lock and so database was repaired and started.After this one of the config servers was holding lot of data in the moveChunk folder and has exhausted all the space and processes were unresponsive. I dont know why it was holding all the data in the moveChunk folder- a mystery for me. Once the whole shard network was restarted, many data writes were complaining of stale config, and subsequently were retrying and then exhausting all the file descriptors and then Router became totally unresponsive and was throwing socket exceptions.An idea to run the shard on one config server rather than 3 config servers was considered and tested in staging environment and applied to prod sharding network.Now I saw stale config warning and then unable to transfer data errors and then finally out of file descriptors error. I am running 2.0.1 Do you think any of these problems would go away if I upgrade it to 2.1.0 or would I be in worse condition? Any idea why all these errors are happening?

            Assignee:
            greg_10gen Greg Studer
            Reporter:
            preethamraj Preetham Derangula
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: