[JAVA-169] Inserts fails during a replica set failover Created: 28/Sep/10  Updated: 09/Jan/14  Resolved: 23/Feb/11

Status: Closed
Project: Java Driver
Component/s: Cluster Management
Affects Version/s: 2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alvin Richards (Inactive) Assignee: Antoine Girbal
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

db version v1.7.1-pre-, pdfile version 4.5
git hash: eb74c84814a1f0aceb7cd335e1296d419550ae23
sys info: Linux domU-12-31-39-06-79-A1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41


Issue Links:
Related

 Description   

Problem:
Three member replica set. I have code that is inserting into a collection, the connection string has been configured with the three members

MongoOptions mo = new MongoOptions();
mo.connectionsPerHost = MAX_THREADS + 5;
mo.autoConnectRetry = true;

String addresses = args[0];

while (addresses.contains(","))

{ String next[] = addresses.split(","); connectionList.add(createAddress(next[0])); addresses = next[1]; }

connectionList.add(createAddress(addresses));

Mongo mongo = new Mongo( connectionList , mo );

When I cause the current primary to step down (rs.stepDown()), the Java driver throws the following

Exception in thread "Thread-1" com.mongodb.MongoException: not talking to master and retries used up
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:206)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:208)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:208)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:223)
at com.mongodb.DBCollection.findOne(DBCollection.java:486)
at com.mongodb.DBCollection.findOne(DBCollection.java:475)
at com.mongodb.DB.command(DB.java:137)
at com.mongodb.DB.getLastError(DB.java:283)
at InsertSpeed$RunnerInsert.run(InsertSpeed.java:72)

Not sure if there is something else that can be set in order to continue the insert process during the take over event.

Workaround:
When going through the mongos process (in a sharded + repl set deployment), can the connection is to the mongos process then this exception does not occur.

Business Case:
Reliability
User Expereince



 Comments   
Comment by Antoine Girbal [ 23/Feb/11 ]

as per previous explanation, this works as expected.
Please reopen if you believe we can improve further.

Comment by Antoine Girbal [ 17/Feb/11 ]

Tony:
reason for this is because if you have slaveOk=true, then the reads will go to slaves and master does not get updated.
Once the write operation happens, it goes to what driver believes is master and fails once, then driver will update master.

If slaveOk=false, then any read happening will try to go to master and fail, but then automatically retry after updating master.
In turn, following writes wont fail.

It would be too expensive to check if a server is still master before every writes.
So instead we let potentially 1 write fail and update the master.
The exception should be handled by app to retry write.
Note that this caveat is usually avoided by:

  • if also doing reads with slaveok=false, which will update master
  • every 5s the bg thread will also update it

Alvin:
As for exception "not talking to master and retries used up".
That only happens on a read with slaveok=false.
Driver automatically retries read twice but it is possible that the replica set takes time to elect a new master.
In that case this error may occur.
Workaround is use slaveok=true, or catch exception and retry after a wait time.

Comment by Tony Nelson [ 11/Feb/11 ]

I was seeing a similar problem with Mongo server 1.6.5 and Java client library 2.4.

The full email log can be seen here: http://groups.google.com/group/mongodb-user/browse_thread/thread/42a12735b26e5be3

In a nutshell, connecting to a 3 node replica set with slaveOk() set caused this issue:

Caused by: com.mongodb.MongoException: not master
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:136)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:157)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:141)
at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:225)
at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:180)
at com.mongodb.DBCollection.insert(DBCollection.java:72)
at com.mongodb.DBCollection.save(DBCollection.java:537)
at com.mongodb.DBCollection.save(DBCollection.java:517)

On any write after I manually made the master step down.

Commenting out slaveOk() worked around the problem.

Comment by Jeff Yemin (Inactive) [ 05/Oct/10 ]

I'm getting that exception using github trunk, by the way (while testing slaveOk()), and it didn't recover.

Comment by Eliot Horowitz (Inactive) [ 05/Oct/10 ]

the find case should be fixed in 2.2
note you will get 1 failure - then it will fail over.

Comment by Jeff Yemin (Inactive) [ 05/Oct/10 ]

Also happening on find:

com.mongodb.MongoException: not talking to master and retries used up
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:222)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:224)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:224)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:256)
at com.mongodb.DBCollection.findOne(DBCollection.java:467)
at com.mongodb.DBCollection.findOne(DBCollection.java:456)
at com.mongodb.DB.command(DB.java:141)
at com.mongodb.DBCollection.getCount(DBCollection.java:642)
at com.mongodb.DBCursor.size(DBCursor.java:553)

Nothing to do but restart Java process?

Generated at Thu Feb 08 08:51:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.