[JAVA-232] Getting node down error when all servers are up Created: 12/Dec/10  Updated: 25/Jun/13  Resolved: 24/Oct/11

Status: Closed
Project: Java Driver
Component/s: Cluster Management
Affects Version/s: 2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Joseph Wang Assignee: Antoine Girbal
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

[joseph.wang@stage1.vpc3 ~]$ uname -a
Linux stage1.vpc3.estalea.net 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux


Attachments: Text File DbCollectionVersionTableUpdater.java     Text File MongoConnnection.java     Text File MorgageOrderStatusUpdaterTask.java     Text File MortgageOrderStatusTableUpdater.java    

 Description   

[joseph.wang@stage1.vpc3 ~]$ ps auxww | grep mongo
2039 2337 2.5 7.8 24189064 615480 ? Sl Nov30 429:43 /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongod -f /db/lps-mongodb_master_stage1/mongodb.cnf
2040 2406 3.4 6.8 24568684 541140 ? Sl Nov30 567:09 /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongod -f /db/lps-mongodb_slave1_stage1/mongodb.cnf
2041 15172 1.1 6.5 23775900 519500 ? Sl Dec11 10:06 /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongod -f /db/lps-mongodb_slave2_stage1/mongodb.cnf --quiet
805 25812 0.0 0.0 6024 656 pts/1 S+ 05:05 0:00 grep mongo
[joseph.wang@stage1.vpc3 ~]$ /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongo localhost:4101
MongoDB shell version: 1.6.3
connecting to: localhost:4101/test
> exit
bye
[joseph.wang@stage1.vpc3 ~]$ /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongo localhost:4102
MongoDB shell version: 1.6.3
connecting to: localhost:4102/test
> exit
bye
[joseph.wang@stage1.vpc3 ~]$ /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongo localhost:4103
MongoDB shell version: 1.6.3
connecting to: localhost:4103/test
> exit
bye

> rs.status()
{
"set" : "lp",
"date" : "Sun Dec 12 2010 05:07:17 GMT-0800 (PST)",
"myState" : 1,
"members" : [

{ "_id" : 0, "name" : "stage1.vpc3.estalea.net:4101", "health" : 1, "state" : 2, "uptime" : 993385, "lastHeartbeat" : "Sun Dec 12 2010 05:07:15 GMT-0800 (PST)" }

,

{ "_id" : 1, "name" : "stage1.vpc3.estalea.net:4102", "health" : 1, "state" : 1, "self" : true }

,

{ "_id" : 2, "name" : "stage1.vpc3.estalea.net:4103", "health" : 1, "state" : 2, "uptime" : 50789, "lastHeartbeat" : "Sun Dec 12 2010 05:07:16 GMT-0800 (PST)" }

],
"ok" : 1
}

[22:55:05.073] node down: stage1.vpc3.estalea.net:4103 java.io.IOException: couldn't connect to [stage1.vpc3.estalea.net/10.9.0.5:4103] bc:java.net.ConnectException: Connection refused
[22:55:25.874] replica set mode, switching master
[22:55:25.874] java.io.EOFException
[22:55:25.874] at org.bson.io.Bits.readFully(Bits.java:37)
[22:55:25.874] at org.bson.io.Bits.readFully(Bits.java:28)
[22:55:25.874] at com.mongodb.Response.<init>(Response.java:35)
[22:55:25.874] at com.mongodb.DBPort.go(DBPort.java:101)
[22:55:25.874] at com.mongodb.DBPort.go(DBPort.java:66)
[22:55:25.874] at com.mongodb.DBPort.call(DBPort.java:56)
[22:55:25.874] at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
[22:55:25.874] at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:284)
[22:55:25.874] at com.mongodb.DBCursor._check(DBCursor.java:297)
[22:55:25.874] at com.mongodb.DBCursor._hasNext(DBCursor.java:420)
[22:55:25.874] at com.mongodb.DBCursor.hasNext(DBCursor.java:445)
[22:55:25.874] at com.leadpoint.syndication.db.mongo.mortgage.DbCollectionVersionTableUpdater.execute(DbCollectionVersionTableUpdater.java:57)
[22:55:25.874] at com.leadpoint.syndication.db.mongo.mortgage.TableUpdater.execute(TableUpdater.java:40)
[22:55:25.874] at com.leadpoint.syndication.db.mongo.mortgage.TableUpdaterTask.run(TableUpdaterTask.java:50)
[22:55:25.874] at java.util.TimerThread.mainLoop(Timer.java:512)
[22:55:25.874] at java.util.TimerThread.run(Timer.java:462)
[22:55:25.874] MyPort.error called
[22:55:25.874] java.io.EOFException
[22:55:25.874] at org.bson.io.Bits.readFully(Bits.java:37)
[22:55:25.874] at org.bson.io.Bits.readFully(Bits.java:28)
[22:55:25.874] at com.mongodb.Response.<init>(Response.java:35)
[22:55:25.874] at com.mongodb.DBPort.go(DBPort.java:101)
[22:55:25.874] at com.mongodb.DBPort.go(DBPort.java:66)
[22:55:25.874] at com.mongodb.DBPort.call(DBPort.java:56)
[22:55:25.874] at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
[22:55:25.874] at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:284)
[22:55:25.874] at com.mongodb.DBCursor._check(DBCursor.java:297)
[22:55:25.874] at com.mongodb.DBCursor._hasNext(DBCursor.java:420)
[22:55:25.874] at com.mongodb.DBCursor.hasNext(DBCursor.java:445)
[22:55:25.874] at com.leadpoint.syndication.db.mongo.mortgage.DbCollectionVersionTableUpdater.execute(DbCollectionVersionTableUpdater.java:57)
[22:55:25.874] at com.leadpoint.syndication.db.mongo.mortgage.TableUpdater.execute(TableUpdater.java:40)
[22:55:25.874] at com.leadpoint.syndication.db.mongo.mortgage.TableUpdaterTask.run(TableUpdaterTask.java:50)
[22:55:25.874] at java.util.TimerThread.mainLoop(Timer.java:512)
[22:55:25.874] at java.util.TimerThread.run(Timer.java:462)



 Comments   
Comment by Scott Hernandez (Inactive) [ 26/Aug/11 ]

This code has been updated a lot, do you still see this with the latest driver?

Comment by Joseph Wang [ 12/Dec/10 ]

latest version w/ getlasterror check

Comment by Joseph Wang [ 12/Dec/10 ]

Still getting error. However, exception handling is not triggered. Cannot detech such condition to force reconnect.

[15:27:36.227] at java.util.TimerThread.mainLoop(Timer.java:512)
[15:27:36.227] at java.util.TimerThread.run(Timer.java:462)
[15:27:36.227] MyPort.error called
[15:27:36.227] java.io.EOFException
[15:27:36.227] at org.bson.io.Bits.readFully(Bits.java:37)
[15:27:36.227] at org.bson.io.Bits.readFully(Bits.java:28)
[15:27:36.227] at com.mongodb.Response.<init>(Response.java:35)
[15:27:36.227] at com.mongodb.DBPort.go(DBPort.java:101)
[15:27:36.227] at com.mongodb.DBPort.go(DBPort.java:66)
[15:27:36.227] at com.mongodb.DBPort.call(DBPort.java:56)
[15:27:36.227] at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
[15:27:36.227] at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:218)
[15:27:36.227] at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:284)
[15:27:36.227] at com.mongodb.DBCursor._check(DBCursor.java:297)
[15:27:36.227] at com.mongodb.DBCursor._hasNext(DBCursor.java:420)
[15:27:36.227] at com.mongodb.DBCursor.hasNext(DBCursor.java:445)
[15:27:36.227] at com.leadpoint.syndication.db.mongo.mortgage.MortgageOrderStatusTableUpdater.execute(MortgageOrderStatusTableUpdater.java:134)
[15:27:36.227] at com.leadpoint.syndication.db.mongo.mortgage.MorgageOrderStatusUpdaterTask.run(MorgageOrderStatusUpdaterTask.java:52)
[15:27:36.227] at java.util.TimerThread.mainLoop(Timer.java:512)
[15:27:36.227] at java.util.TimerThread.run(Timer.java:462)

Comment by Joseph Wang [ 12/Dec/10 ]

Adding the following code to see if can work around the problem.

DBObject dbObject = db.getLastError();
if (dbObject != null && dbObject.get("err") != null)

{ log.warn("DbCollectionVersionTableUpdater: Encounter error for query " + dbQuery.toString()); error_count++; }

if (error_count >= error_count_threshold)

{ log.warn("DbCollectionVersionTableUpdater: Reset connection due to high error count " + error_count); MongoConnnection.getInstance().close(); MongoConnnection.getInstance().reconnect(); error_count = 0; }

Looking at the resin log, the problem started 22:55 last night till this morning.

[22:55:25.874] replica set mode, switching master
[22:55:25.874] java.io.EOFException
[22:55:25.874] at org.bson.io.Bits.readFully(Bits.java:37)
[22:55:25.874] at org.bson.io.Bits.readFully(Bits.java:28)
[22:55:25.874] at com.mongodb.Response.<init>(Response.java:35)
[22:55:25.874] at com.mongodb.DBPort.go(DBPort.java:101)
[22:55:25.874] at com.mongodb.DBPort.go(DBPort.java:66)
[22:55:25.874] at com.mongodb.DBPort.call(DBPort.java:56)
[22:55:25.874] at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
[22:55:25.874] at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:284)
[22:55:25.874] at com.mongodb.DBCursor._check(DBCursor.java:297)
[22:55:25.874] at com.mongodb.DBCursor._hasNext(DBCursor.java:420)
[22:55:25.874] at com.mongodb.DBCursor.hasNext(DBCursor.java:445)

...
[00:05:25.874] at java.util.TimerThread.mainLoop(Timer.java:512)
[00:05:25.874] at java.util.TimerThread.run(Timer.java:462)
[00:05:25.874] MyPort.error called
[00:05:25.874] java.io.EOFException
[00:05:25.874] at org.bson.io.Bits.readFully(Bits.java:37)
[00:05:25.874] at org.bson.io.Bits.readFully(Bits.java:28)
[00:05:25.874] at com.mongodb.Response.<init>(Response.java:35)
[00:05:25.874] at com.mongodb.DBPort.go(DBPort.java:101)
[00:05:25.874] at com.mongodb.DBPort.go(DBPort.java:66)
[00:05:25.874] at com.mongodb.DBPort.call(DBPort.java:56)
[00:05:25.874] at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
[00:05:25.874] at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:284)
[00:05:25.874] at com.mongodb.DBCursor._check(DBCursor.java:297)
[00:05:25.874] at com.mongodb.DBCursor._hasNext(DBCursor.java:420)
[00:05:25.874] at com.mongodb.DBCursor.hasNext(DBCursor.java:445)
[00:05:25.874] at com.leadpoint.syndication.db.mongo.mortgage.LenderBrandInformationTableUpdater.execute(LenderBrandInformationTableUpdater.java:79)
[00:05:25.874] at com.leadpoint.syndication.db.mongo.mortgage.TableUpdater.execute(TableUpdater.java:56)
[00:05:25.874] at com.leadpoint.syndication.db.mongo.mortgage.TableUpdaterTask.run(TableUpdaterTask.java:50)
[00:05:25.874] at java.util.TimerThread.mainLoop(Timer.java:512)
[00:05:25.874] at java.util.TimerThread.run(Timer.java:462)

Note that the primary has been running since Nov 30.
[joseph.wang@stage1.vpc3 ~]$ ps auxww | grep mongo
2039 2337 2.5 7.8 23769060 618880 ? Sl Nov30 429:44 /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongod -f /db/lps-mongodb_master_stage1/mongodb.cnf
2040 2406 3.4 6.8 24076972 540316 ? Sl Nov30 567:09 /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongod -f /db/lps-mongodb_slave1_stage1/mongodb.cnf

Comment by Joseph Wang [ 12/Dec/10 ]

MongoConnection.java: connection pool singleton
DbCollectionVersionTableUpdater.java: query code

What is more troubling is that it stuck in this mode forever. If there is a problem with a slave, should the Java driver auto routes to another slave?

Comment by Joseph Wang [ 12/Dec/10 ]

see sockets for all three mongo

bash-3.2$ netstat -an | grep 4103 | grep -v CLOSE_WAIT | wc -l
54
bash-3.2$ netstat -an | grep 4102 | grep -v CLOSE_WAIT | wc -l
134
bash-3.2$ netstat -an | grep 4101 | grep -v CLOSE_WAIT | wc -l
105

Generated at Thu Feb 08 08:51:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.