[JAVA-929] write to replicaset fails after failover Created: 16/Aug/13  Updated: 28/Aug/13  Resolved: 28/Aug/13

Status: Closed
Project: Java Driver
Component/s: Connection Management
Affects Version/s: 2.11.1, 2.11.2
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: gomil Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoDB V2.0.6 on x86_64 GNU/Linux



 Description   

Setup: ReplicaSet with two nodes (node30, node31)

I use the java-driver (V2.11.2) with the constructor below (via Spring-Data's MongoFactoryBean)

   new public Mongo( List<ServerAddress> seeds , MongoOptions options  ) 

If the master (node30) goes down, all further write-attempts fail:

Aug 16, 2013 11:52:30 AM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: /10.150.20.30:27000 -  java.io.IOException - message: Connection refused: connect

I analyzed the driver (class ReplicaSetStatus)

  • the old master (node30) is as expected no longer in "acceptableMembers"
  • the old secondary (node31) is now marked as master, but the old master (node30) is also marked as master
  • the method to find the actual master uses variable "all" instead of "acceptableMembers" and returns the old (dead) master (node30)

I fixed it locally (used "acceptableMembers" instead of "all" in findMaster() ) and it worked fine for the first failover.
BUT ... if node30 comes up again and node31 goes down I get another errormessage and then again the same error (-message) as above

Aug 16, 2013 12:30:20 PM com.mongodb.DBPortPool gotError
WARNING: emptying DBPortPool to /10.150.20.30:27000 b/c of error
java.io.EOFException
	at org.bson.io.Bits.readFully(Bits.java:48)
	at org.bson.io.Bits.readFully(Bits.java:33)
	at org.bson.io.Bits.readFully(Bits.java:28)
	at com.mongodb.Response.<init>(Response.java:40)
	at com.mongodb.DBPort.go(DBPort.java:142)
	at com.mongodb.DBPort.go(DBPort.java:106)
	at com.mongodb.DBPort.findOne(DBPort.java:162)
	at com.mongodb.DBPort.runCommand(DBPort.java:170)
	at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:100)
	at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:142)
	at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:115)
	at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:248)
	at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:204)
	at com.mongodb.DBCollection.insert(DBCollection.java:148)
	at com.mongodb.DBCollection.insert(DBCollection.java:91)
	at com.mongodb.DBCollection.save(DBCollection.java:810)
	at org.springframework.data.mongodb.core.MongoTemplate$10.doInCollection(MongoTemplate.java:884)
	at org.springframework.data.mongodb.core.MongoTemplate.execute(MongoTemplate.java:388)
	at org.springframework.data.mongodb.core.MongoTemplate.saveDBObject(MongoTemplate.java:879)
	at org.springframework.data.mongodb.core.MongoTemplate.doSave(MongoTemplate.java:819)
	at org.springframework.data.mongodb.core.MongoTemplate.save(MongoTemplate.java:756)
	at org.springframework.data.mongodb.core.MongoTemplate.save(MongoTemplate.java:744)

Aug 16, 2013 12:32:32 PM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: /10.150.20.31:27000 - java.io.IOException - message: Connection refused: connect



 Comments   
Comment by gomil [ 28/Aug/13 ]

Hello Jeff,

thanks for your reply!
You are right: ReplicaSetNode._ok is false for an unreachable server!!
The driver works as expected.

In my tests I can see that the data is correctly written to the new master, while ConnectionsStatus$UpdatableNode warns (correctly) about the not reachable old master.

Sorry for the noise!

Comment by Jeffrey Yemin [ 24/Aug/13 ]

I examined the code, and though findMaster does use "all", it calls ReplicaSetNode.master(), which will return false unless _ok is true as well as _isMaster. If the old primary is no longer reachable, then _ok should be false. Are you seeing ReplicaSetNode._ok == true for an unreachable server?

Generated at Thu Feb 08 08:53:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.