Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29265

STARTUP2 and WriteConcern=majority don't work well together

    • Type: Icon: New Feature New Feature
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Hi,

      Problem:

      We just did a rolling upgrade from 3.0.14 to 3.4.3 and we noticed some strange behavior while doing the initial sync. All writes take 5 seconds to complete, making the app very slow and leading to timeouts and finally exhausted connection pools.

      Our set-up:

      The app is a Play-Scala application using the ReactiveMongo driver (0.12.2). We use w: majority and wtimeout: 5000 as our write concern.
      The DB is a replica set of three data bearing nodes, with 2 arbiters.
      Everything is hosted on AWS EC2, using EBS volumes.

      It's a pretty big database, the initial sync takes around 1,5 hours, incl. index creation.

      Steps

      • We add the new instances to the replica set (on top of the three other nodes), and they start to do an initial sync (from scratch, so not from an AWS snapshot).
      • Once the new node is in the cluster, all writes start to take exactly 5 seconds (the wtimeout value). We don't see much load on the app or on the primary though.
      • After digging in to the problem, we reconfigured the new nodes with priority:0 and votes:0 and from that moment onwards the whole app is working properly again.

      So we have a kind of workaround, but having to use this workaround sounds undesirable. It also looks a bit weird to me, since the rest of the cluster is quick, there is no need to wait for the new instances the get a majority at all, right? 3 nodes out of 5 respond quick, so that should be enough to return.

      Also, during the initial sync, I don't see the need for writes to be sent to the node, since it's not caught up by far and afterwards it will replay these writes anyway.

      Is it possible to ignore nodes that are in STARTUP2 when replicating writes? Or maybe another solution might be to quickly return a special acknowledgement when the node is in STARTUP2, so the primary can return the write back to the application?

            Assignee:
            schwerin@mongodb.com Andy Schwerin
            Reporter:
            jankeesvanandel@gmail.com Jan-Kees van Andel
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: