Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-2543

Mongo driver exception "Replication is shutting down" on mongo save during replicaset failover and election process

    • Type: Icon: Task Task
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.1
    • Component/s: Cluster Management
    • Labels:
      None
    • Environment:
      Windows Server 2012 Release 2

      Main Question: Should I change application code to resolve exceptions due to primary step down and election of a new primary by opening the connection again or should I be able to handle this using the driver timeout settings and a retry of save on exception?

      Background

      I have a four node replicaset that we are putting through initial development / resilience tests. We are pre-production.. I'm testing failover of the primary by stopping the mongod service. The election occurs and a new primary is elected. I have no errors in the mongod logs and the election takes place.

      However, I have a process that makes a connection to the replicaset over SSL, and when we are running a long running batch jobs performing batch mongo saves, when the service fails over, I get the following error on the client. The exception is caught and but the mongo save throws an exception again in the time before the new primary is elected.

      I have tried resetting values for electionTimeoutMilllis up and down, reducing the heartbeat etc..

      We're Currently on Mongo 3.4.5 database on Windows and the replica set uses protocol 1.

      The client connection uses the default values for timeout. We use "majority" to write.
      I have also tried setting different values for maxConnectionIdleTime, connectTimeout,
      serverSelectionTimeout but none of these change the exception message we are getting.

      For now I'm simply using the following.

      		MongoClient mongoClient =  new MongoClient(new MongoClientURI(mongoUri));
      		mongoClient.setWriteConcern(WriteConcern.MAJORITY);
      

      The stacktrace and config are attaching in configuration.txt but the main error I get is:

      org.springframework.data.mongodb.UncategorizedMongoDbException: Query failed with error code 11600 and error message 'interrupted at shutdown' on server xxxx; nested exception is com.mongodb.MongoQueryException: Query failed with error code 11600 and error message 'interrupted at shutdown'

      I would be grateful for some ideas whether we could make a settings change which would enable the client code to detect the server problem and then wait for the election process to occur.

      I've tried pinging the server after the first mongo save but this hasn't helped.

      Do we need to re-architect our application code with a more sophisticated approach than retry on initial exception?

      Thanks in advance.

            Assignee:
            Unassigned Unassigned
            Reporter:
            gearoid68 Garrett Donnelly
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: