[CSHARP-671] Driver does not automatically reconnect to secondary Created: 30/Jan/13  Updated: 20/Mar/14  Resolved: 11/Feb/13

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.7
Fix Version/s: 1.8

Type: Bug Priority: Major - P3
Reporter: Michael Crino Assignee: Craig Wilson
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Replica set
Two mongod instances (one with two votes)


Attachments: Zip Archive MongoTester.zip    
Issue Links:
Related

 Description   

Under the following conditions the client is unable to automatically re-connect to a secondary without restarting IIS:

  • The primary is down
  • The client is connected to the secondary
  • The secondary goes down
  • The secondary comes back up

In an attempt to recover from this I tried .Reconnect(), but the client is still unable to reconnect to the server.

Dug into the code a bit and it appears that as soon as the secondary starts up again and the client notices and apparently erroneously removes the secondary from the list of instances. Because the primary is down and the primary is now the only remaining instance the client knows about it will not reconnect.

If I alter IsValidInstance in ReplicaSetMongoServerProxy to skip the 'instance.InstanceType != MongoServerInstanceType.ReplicaSetMember' check it appears to behave as expected.

I've tried lots of different connection strings and they do not appear to alter the behavior. Here in an example if it is helpful: mongodb://user:password@localhost,tools/DatabaseName?safe=true&readPreference=primaryPreferred

I don't understand what should be going on well enough to submit a patch - sorry.

Thanks,
-Michael



 Comments   
Comment by auto [ 11/Feb/13 ]

Author:

{u'date': u'2013-01-31T21:17:29Z', u'name': u'Craig Wilson', u'email': u'craiggwilson@gmail.com'}

Message: CSHARP-671: server sometimes neglects to return setName causing the driver to think that the instance is not a replica set member and we remove it from memory preventing recovery.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/95b83dc73e976b63eb8e338665c34a672e8a5811

Comment by Michael Crino [ 02/Feb/13 ]

Awesome, thanks for your help!

Comment by Craig Wilson [ 02/Feb/13 ]

Michael, you rock. Thanks so much. I'll test this out and write some tests for the scenario, but it is great to hear that the problem is solved.

1.8 will be released at the same time server 2.4 is released. That should be within a month or two.

This problem isn't as bad as it actually sounds. After the secondary gets removed, it will be gone until another secondary or the primary comes back. Once that happens, it will get re-added back in. This is the first time we've heard of this so thanks so much for the report.

Comment by Michael Crino [ 02/Feb/13 ]

A simple test application

Comment by Michael Crino [ 02/Feb/13 ]

The updated code appears to have resolved this problem! Thank you! Any idea when 1.8 will be released?

In the interest of documentation I attached a simple program to reproduce this problem. If you follow the below instructions this occurs every time. I also noticed that if you start the test client before doing anything else the behavior is different... For this test hawk is my secondary; helix my primary.

<primary>
echo test123 > mongod.key
mkdir database
mongod --keyFile mongod.key --auth --dbpath database --directoryperdb --journal --nohttpinterface -replSet rs0
mongo
rs.initiate()
cnf=rs.conf()
cnf.members[0].votes=2
rs.reconfig(cnf)
use admin
db.addUser("admin","test123")
db.auth("admin","test123")
use test
db.addUser("test","test123")

<secondary>
echo test123 > mongod.key
mkdir database
mongod --keyFile mongod.key --auth --dbpath database --directoryperdb --journal --nohttpinterface -replSet rs0

<primary>
rs.add("hawk")

rs.status()

Launch test app
Wait 30s or so for everything to come up

Ctrl-C primary
Wait 30s or so for everything to switch over
Ctrl-C secondary
Wait 30s or so
Restart secondary
Wait forever
Observer the test client doesn't automatically re-connect
Restart the test client, observe the test client connecting to the secondary

Comment by Craig Wilson [ 31/Jan/13 ]

Michael,
I still haven't been able to reproduce. However, I did find one loophole in or server implementation that would cause setName to not get sent back. Rather, a field called isreplicaset is sent. I have incorporated that into the checking so as not to incorrectly remove the member.

If you wouldn't mind, I have pushed this to my branch here: https://github.com/craiggwilson/mongo-csharp-driver/tree/csharp671. If you wouldn't mind testing this out on your small sample program, that'd be awesome. This is based on the 1.8 release we are prepping for so some things related to auth have changed. Let me know if you have any trouble or if you don't have time to do this.

Thanks a lot.

Comment by Craig Wilson [ 31/Jan/13 ]

Ok. Well, at least what is happening matches exactly what I said above. If the instance type is coming back as StandAlone, it means the server is not sending a replica set name. This might be a server bug but it's definitely something that we need to handle. I'll try and repro again with your server version and command line options.

Comment by Michael Crino [ 31/Jan/13 ]

1) I'm using MongoDB v2.2.2
2) It's a manual thing. To test the failure I ctrl+C out of the interactively mongod instance to restart it I re-run the command line. The command line I was using is ./mongod --keyFile mongod.key --auth --dbpath database --directoryperdb --journal --nohttpinterface -replSet rs0. The instance type was StandAlone - which I meant to mention originally as that seems odd.

The replica set was previously initialized.

The thing I find particularly interesting is if the app/IIS is restarted it works just fine. Additionally I noticed the secondary takes awhile to come online if no primary is available and it can't elect its self. The secondary is removed from the clients instance lists immediately after the secondary starts up; well before the instance moves into a secondary state.

I hope that helps. Willing to put together samples or a movie or something if the above is not enough - might take a few days...

Thanks for your assistance

Comment by Craig Wilson [ 30/Jan/13 ]

Thanks for reporting Michael... I started looking into this and unfortunately am having trouble reproducing. I have a couple of questions that will hopefully lead me in the right direction.

  1. What version of the server are you using?
  2. When the secondary comes back up, was this automatic or a manual thing?

The second question here is very important. The way the driver works is that once it determines you are talking to a replica set, it ensures that all the members it is talking to are in the same replica set. You pointed to a line of code that seemed to be the culprit. I would be interested to know what that InstanceType the instance claimed to be? Was it saying is was a StandAlone, Uknown, or a ShardedRouter.

This is important is because the only way it doesn't end up with an InstanceType == ReplicaSetMember is that when running the isMaster command, it doesn't report a setName. This would happen if mongod was started without the replSet option or it was not initialized into a replSet yet.

Anyways, let me know I'll continue to work on this. Thanks...

Generated at Wed Feb 07 21:37:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.