[JAVA-212] java driver hung when primary is killed Created: 05/Nov/10  Updated: 25/Jun/13  Resolved: 24/Oct/11

Status: Closed
Project: Java Driver
Component/s: Cluster Management
Affects Version/s: 2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Joseph Wang Assignee: Antoine Girbal
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Java Source File DbInsertListWorker.java     Java Source File MongoDbInsert.java     Java Source File MortgageObject.java     Text File java-212-output.txt    

 Description   

1.7.2 servers.

Code:

try {
if (enable_debug)

{ System.err.println("Info DbInsertListWorker: Insertion begin"); }

try {
if (mongo != null) {
DB db = mongo.getDB(dbName);
if (db != null) {
db.requestStart();
DBCollection collection = db
.getCollection(collectionName);
if (collection != null) {
db.resetError();
WriteConcern concern = new WriteConcern(2, 1000);
collection.insert(list.toArray(new DBObject[list
.size()]), concern);
DBObject dbObject = db.getPreviousError();
if (dbObject != null && dbObject.get("err") != null)

{ System.out .println("Info DbInsertListWorker preverror:" + dbObject.toString()); }

dbObject = db
.getLastError(WriteConcern.REPLICAS_SAFE);
if (dbObject != null && dbObject.get("err") != null)

{ System.out .println("Info DbInsertListWorker lasterror:" + dbObject.toString()); }

} else

{ System.out .println("Info DbInsertListWorker: empty collection"); }

db.requestDone();
} else

{ System.out.println("Info DbInsertListWorker: empty db"); }

} else

{ System.out.println("Info DbInsertListWorker: empty mongo"); }

} catch (Exception ex)

{ System.out.println("DbInsertListWorker: exception " + ex); ex.printStackTrace(); }

if (enable_debug)

{ System.err.println("Info DbInsertListWorker: Insertion end"); }

} catch (Exception ex)

{ System.err.println("Error DbInsertListWorker:" + ex.toString()); }

}

Have 1 primary, 1 slave, and 1 arbiter. Killed primary while running insertion test. The secondary
became the primary immediately. However, Java driver seems to hang.

Nov 4, 2010 11:48:15 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused
Nov 4, 2010 11:48:20 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused
Nov 4, 2010 11:48:25 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused
Nov 4, 2010 11:48:30 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused
Nov 4, 2010 11:48:35 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused
Nov 4, 2010 11:48:40 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused
Nov 4, 2010 11:48:45 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused
...
Nov 4, 2010 11:48:50 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused
Nov 4, 2010 11:48:55 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused



 Comments   
Comment by Scott Hernandez (Inactive) [ 24/Oct/11 ]

With the latest driver, soon to be 2.7, we cannot reproduce this. Please let us know if you can.

Comment by Joseph Wang [ 11/Feb/11 ]

1) we've not switched to 2.4 java driver.
2) don't remember. the ticket was open several months ago.

Comment by Antoine Girbal [ 11/Feb/11 ]

output from sample test

Comment by Antoine Girbal [ 11/Feb/11 ]

Looking into this issue.
First thing: note that the following error message is normal
"Nov 4, 2010 11:48:15 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: ip-10-166-59-166:20000 java.io.IOException: couldn't connect to [ip-10-166-59-166/10.166.59.166:20000] bc:java.net.ConnectException: Connection refused"
The driver will keep trying to test this server every 5s, since it is part of the replica set.

I tried to reproduce this issue with driver 2.5.
I could not get the driver to hang.
I basically took your code, with an insert of 10,000 objects, and I repeat the insert 10 times.
During the 1st insert, before it calls getlasterror I added a short sleep and I kill the primary server.
The driver shows that it correctly switch over to a different master, and I end up with correct 100,000 documents in collection.
I attached the output log.

Questions:

  • do you still experience issue with 2.4 driver?
  • do you know when exactly you kill the primary server? I can will try to time it to kill server during the insert itself.
Comment by Joseph Wang [ 05/Nov/10 ]

It seems the only way to get around this is to close mongo instance and reestablish connection.

Comment by Joseph Wang [ 05/Nov/10 ]

Killed the primary while running the insert. Even though the primary switched
from ip-10-166-59-166:20000 to ip-10-166-57-74:20000, the driver is still
connected to ip-10-166-59-166:20000.

mongo.debugString() showed:
DBTCPConnector: replica set : [ip-10-166-59-166:20000,
ip-10-166-57-74:20000]

mongo.getConnectPoint() showed:
ip-10-166-59-166:20000

Comment by Joseph Wang [ 05/Nov/10 ]

insert test script

no=10000000
id=1
thread=10
version=0
option=1

date > insert.log
java -cp "/root/mongo-2.3.jar:/root/leadpoint.jar:." com.leadpoint.db.MongoDbInsert -no ${no} -thread ${thread} -id ${id} -version ${version} -option ${option} >> insert.log 2>&1
date >> insert.log

Generated at Thu Feb 08 08:51:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.