Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Gone away
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.1
Component/s: Cluster Management
Labels:
None
Environment:
Windows Server 2012 Release 2

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Main Question: Should I change application code to resolve exceptions due to primary step down and election of a new primary by opening the connection again or should I be able to handle this using the driver timeout settings and a retry of save on exception?

Background

I have a four node replicaset that we are putting through initial development / resilience tests. We are pre-production.. I'm testing failover of the primary by stopping the mongod service. The election occurs and a new primary is elected. I have no errors in the mongod logs and the election takes place.

However, I have a process that makes a connection to the replicaset over SSL, and when we are running a long running batch jobs performing batch mongo saves, when the service fails over, I get the following error on the client. The exception is caught and but the mongo save throws an exception again in the time before the new primary is elected.

I have tried resetting values for electionTimeoutMilllis up and down, reducing the heartbeat etc..

We're Currently on Mongo 3.4.5 database on Windows and the replica set uses protocol 1.

The client connection uses the default values for timeout. We use "majority" to write.
I have also tried setting different values for maxConnectionIdleTime, connectTimeout,
serverSelectionTimeout but none of these change the exception message we are getting.

For now I'm simply using the following.

		MongoClient mongoClient =  new MongoClient(new MongoClientURI(mongoUri));
		mongoClient.setWriteConcern(WriteConcern.MAJORITY);

The stacktrace and config are attaching in configuration.txt but the main error I get is:

org.springframework.data.mongodb.UncategorizedMongoDbException: Query failed with error code 11600 and error message 'interrupted at shutdown' on server xxxx; nested exception is com.mongodb.MongoQueryException: Query failed with error code 11600 and error message 'interrupted at shutdown'

I would be grateful for some ideas whether we could make a settings change which would enable the client code to detect the server problem and then wait for the election process to occur.

I've tried pinging the server after the first mongo save but this hasn't helped.

Do we need to re-architect our application code with a more sophisticated approach than retry on initial exception?

Thanks in advance.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

configuration.txt
16 kB
Jun 21 2017 10:38:05 AM UTC

Assignee:: Unassigned
Reporter:: Garrett Donnelly
Reviewers:: None
Votes:: 2 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jun 21 2017 10:40:17 AM UTC
Updated:: Oct 27 2023 07:48:24 PM UTC
Resolved:: Jan 02 2018 08:35:13 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates