Initial sync from a WiredTiger instance locks the server

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: 3.0.12
    • Component/s: WiredTiger
    • None
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      We're currently busy migrating all our servers to WiredTiger on MongoDB 3.0.12, and are running into performance / lock issues with the initial sync from a WT server.

      Our setup:

      • A replicaset with 4 members (1 hidden, 1 with no votes).
      • All running MongoDB 3.0.12.
      • Database size is over 1TB (MMAP), around 300GB with WT.
      • No sharding.

      We've found that performing the initial sync from a MMAP server to a new WT server completes without issue. However, the initial sync from a WT server to another WT server gives significant performance issues.

      The issue seems to be specifically when it's doing a sync on large collections (around 100GB in size). After a while the server being synced from becomes completely unresponsive for multiple hours. During this time the replication lag on the server builds up if it's a secondary (it doesn't seem to be replicating at all anymore), most queries to it are completely unresponsive, and it's often not even possible to log into the mongo shell. It seems like the initial sync query is holding a global lock, and not yielding for a very long time. During this time there is also essentially no network traffic on the server.

      Since most of our application does not actively use the secondaries (which we are performing the initial sync from), it does not affect the majority of our system. However, there are a few queries that we do run on our secondaries, which are affected by this.

      When doing the initial sync from a MMAP server, we do not experience these issues at all.

      We have not tested with MongoDB 3.2.x yet.

            Assignee:
            Ramon Fernandez Marina
            Reporter:
            Ralf Kistner
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: