[SERVER-11276] Network Partition Failover Scenario - Connection pool locks system for max of 55 sec every 10 secs Created: 18/Oct/13  Updated: 10/Dec/14  Resolved: 07/Jul/14

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Douglas Hubler Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: connection, cxxcopy
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6/64 bit. Mongo binaries from EPEL repo


Attachments: Java Source File DbTest.java     File partition-hang-issue.cpp     File partition-hang-issue.log    
Operating System: Linux
Steps To Reproduce:
  • install and setup 3 or more mongod servers in a replicaset
  • setup read preference tags so queries can be controlled to specific servers
  • take one server and disconnect it from all others. e.g. myhost.example.org
  • write a tool in c++ that :
  • uses a ScopedDbConnection to connect mongo cluster
  • perform one or more read-only queries that uses read preference tags to select myhost.example.org
  • exits
  • run tool
    ./dbtest1 myreplset/myhost.mydomain
Participants:

 Description   

If a mongod server has been disconnected from other servers, applications using the C++ driver to mongod server could experience as much as 55 seconds of delay every 10 seconds of connectivity.

Issue details discussed on forum
https://groups.google.com/forum/#!topic/mongodb-user/3roJxyGO2Wc



 Comments   
Comment by Thomas Rueckstiess [ 07/Jul/14 ]

Hi Douglas,

I believe Randolph has provided the answer to this issue in his last reply, and based on your last comment, I'll mark this issue resolved now.

Kind Regards,
Thomas

Comment by Douglas Hubler [ 10/Jun/14 ]

thanks for getting back to me. unfortunately I wasn't able to try this
as I am no longer working on that project that use mongo. I'll
forward to appropriate people though.

On Tue, Jun 10, 2014 at 6:18 PM, Ramon Fernandez (JIRA)

Comment by Ramon Fernandez Marina [ 10/Jun/14 ]

Hi dhubler,

have you had a chance to try out renctan's suggestion above of calling the done method? Can you please let us know if this is still an issue for you?

Thanks,
Ramón.

Comment by Randolph Tan [ 22/Apr/14 ]

I apologize for the late response. I was referring to your code - which creates a fresh new connection in every iteration of the loop. If you call done before you delete the sc pointer, then the connection will be reused on the next iteration. The C++ driver current doesn't support setting the read preference on the connection level.

Comment by Douglas Hubler [ 29/Oct/13 ]

Interesting, so you're saying for a workaround, instead of specifying a read preference for each query, set the read preference on the connection object. I'll have to try that. Just to be clear, you're saying this is still an issue, you're just providing a possible workaround.

BTW: You have sample code for on how to use read preference tags AND nearest on connection in C++? I couldn't figure it out and it took a lot reading the code just to discover how to do it on query.

Comment by Randolph Tan [ 29/Oct/13 ]

Hi,

As long as you reuse the same connection object over and over, you should be able to minimize the lockout from ReplicaSetMonitor::check because the DBClientReplicaSet instance will keep on using the same secondary node as much as possible without talking to the ReplicaSetMonitor as long as you don't change the read preference settings or the secondary doesn't error out.

Comment by Douglas Hubler [ 19/Oct/13 ]

Attaching java driver that doesn't exhibit same behavior

Comment by Douglas Hubler [ 18/Oct/13 ]

Instructions

  • Adjust test too for your env. and compile as executable
  • setup mongod replica set with 3 or more nodes
  • configure read preference to allow to read from secondary
  • separate single mongod server from all other nodes

Expected

  • node should be in read only mode

Actual

  • startup takes several minutes on system w/11 mongod servers
  • every 10 seconds queries return data, they hang for 55 seconds while c++ driver retests all connections
Generated at Thu Feb 08 03:25:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.