[CXX-647] fail to connect to replica set Created: 04/Aug/15  Updated: 04/Sep/15  Resolved: 04/Sep/15

Status: Closed
Project: C++ Driver
Component/s: API
Affects Version/s: legacy-1.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Judy Han [X] Assignee: Andrew Morrow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

when I use simple connection string to connect, it works.

connection string:
mongodb://host3:27017

when I use connection string for replica set to connect:

connection string:
mongodb://host1:27017,host2:27017,host3:27017/MtxEventDatabase?replicaSet=MtxEventReplSet

I got following exceptions:

Caught a mongo::DBException.
Exception's what()=No replica set monitor active and no cached seed found for set: MtxEventReplSet

Here is the related code:

    try {
        std::string errString;
        mongo::ConnectionString mongoConnectionString = mongo::ConnectionString::parse(hostAndPortStr_,
            errString);
 
        if (!mongoConnectionString.isValid()) {
            return false;
        }
 
        mongoPtr_ = mongoConnectionString.connect(errString);
        if (!mongoPtr_) {
            return false;
        };
 
        if (!mongoPtr_->auth(mongodbDatabase_, mongodbUsername_, mongodbPassword_,
                errString)) {
            return false;
        };
    } catch (const mongo::DBException&   exc) {
       log_error("Caught a mongo::DBException.\n" <<
            "Exception's what()=" << exc.what());
        return false;
    }
    catch (...) {
        return false;
    }
    return true;

when I try to login from command line it seems to work:

mongo -u <myUser> -p xxxxxxxx --authenticationDatabase <myAuthDb> --host MtxEventReplSet/host1:27017,host2:27017,host3:27017 MtxEventDatabase
MongoDB shell version: 3.0.0
connecting to: MtxEventReplSet/host1:27017,host2:27017,host3:27017/MtxEventDatabase
2015-08-04T15:56:20.015-0700 I NETWORK  starting new replica set monitor for replica set MtxEventReplSet with seeds host1:27017,host2:27017,host3:27017
2015-08-04T15:56:20.016-0700 I NETWORK  [ReplicaSetMonitorWatcher] starting
Server has startup warnings: 
2015-07-31T16:01:59.497-0700 I CONTROL  [initandlisten] 
2015-07-31T16:01:59.497-0700 I CONTROL  [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 1024 processes, 64000 files. Number of processes should be at least 32000 : 0.5 times number of files.
MtxEventReplSet:PRIMARY> rs.status()
{
	"set" : "MtxEventReplSet",
	"date" : ISODate("2015-08-04T22:57:04.777Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "host1:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 345298,
			"optime" : Timestamp(1438725952, 2),
			"optimeDate" : ISODate("2015-08-04T22:05:52Z"),
			"lastHeartbeat" : ISODate("2015-08-04T22:57:02.911Z"),
			"lastHeartbeatRecv" : ISODate("2015-08-04T22:57:04.755Z"),
			"pingMs" : 0,
			"syncingTo" : host3:27017",
			"configVersion" : 1
		},
		{
			"_id" : 1,
			"name" : "host2:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 345294,
			"optime" : Timestamp(1438725952, 2),
			"optimeDate" : ISODate("2015-08-04T22:05:52Z"),
			"lastHeartbeat" : ISODate("2015-08-04T22:57:03.155Z"),
			"lastHeartbeatRecv" : ISODate("2015-08-04T22:57:03.199Z"),
			"pingMs" : 0,
			"syncingTo" : "host3:27017",
			"configVersion" : 1
		},
		{
			"_id" : 2,
			"name" : "host3:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 345305,
			"optime" : Timestamp(1438725952, 2),
			"optimeDate" : ISODate("2015-08-04T22:05:52Z"),
			"electionTime" : Timestamp(1438383726, 1),
			"electionDate" : ISODate("2015-07-31T23:02:06Z"),
			"configVersion" : 1,
			"self" : true
		}
	],
	"ok" : 1
}



 Comments   
Comment by Judy Han [X] [ 04/Sep/15 ]

Thanks very much for your feedback! I will check with my manager on this to see what we can do.

Comment by Andrew Morrow (Inactive) [ 04/Sep/15 ]

OK. I'm going to close this ticket out then, and you can watch CXX-646 for updates. To get a fix though you are going to need to upgrade to the first release that fixes it; currently, it is targeted for legacy-1.0.6.

As far as upgrading to a newer driver version: yes, of course you must consider the stability, and you should first upgrade your development or test environment, certainly not production, and do all necessary validation and testing before making the upgrade.

As far as integrating the fix into your local legacy-1.0.0 tree, that is up to you, but please realize that that patch hasn't even exited code review yet! So, based on your caution re upgrades, you should consider carefully whether that is an acceptable risk, and of course do all necessary testing. However, the confirmation that it works for you is very helpful.

Comment by Judy Han [X] [ 04/Sep/15 ]

Hi Andrew,
Thanks for the info!
Yes, I am encountering the same issue for CXX-646, I verified that by integrating the suggested fix for CXX-646 and rebuild libmongoclient.so, the problem went away after that.
Thanks again!

Regarding getting the newer version of legacy code, we need to consider the stability of our code because of the tests involved with the existing version. I will add the above fix in legacy1.0.0 for now and will get newer version at our scheduled time. Thanks for the advice.

Please feel free to close this ticket.
Thanks,
Judy

Comment by Andrew Morrow (Inactive) [ 03/Sep/15 ]

Great! I'm happy we figured that out. As far as your auth issue, I suspect that that may be related to CXX-646. As of yet we have not validated the fix in the github pull request you will find in that ticket (or we would have merged it already), but if you want to try to confirm that you are encountering the same issue you could always try to apply the patch and see if it fixes things for you. Of course, please understand that the fix is not production ready, so this would be for testing only.

Also, as a general recommendation, I think it would be advisable for you to upgrade away from legacy-1.0.0 to the most recent stable, legacy-1.0.5, as several important bug fixes have landed since legacy-1.0.0. If it turns out that CXX-646 is an issue for you, you might want to wait for legacy-1.0.6 where we currently plan to address CXX-646.

Comment by Judy Han [X] [ 03/Sep/15 ]

Hi Andrew,
There is no possibility that the ConnectionString::connect() can have a race condition with mongo::client::initialize() because I do not instantiate the task (no thread will be spawned) until mongo::client::initialize() returns.
Ah, I am calling mongo::client::shutdown() after started child threads. Thanks! I will take care of that.
I temporarily commented that line now it looks much better:
core dump is gone, connect() passed. Now auth failed with following code:

        // no documentation on whether this throws an exception.
        mongoPtr_ = mongoConnectionString.connect(errString);
        if (!mongoPtr_) {
            MTX_LOG_ERROR_ONLY("fail to connect mongoDB: " << errString);
            return false;
        };
 
        if (!mongoPtr_->auth(mongodbDatabase_, mongodbUsername_, mongodbPassword_,
                errString)) {
            MTX_LOG_ERROR_ONLY("fail to authenticate " << mongodbDatabase_ <<
                " with user " << mongodbUsername_ << " " << errString);
            return false;
        };

"auth failed", code: 18
The command line passed with the same username, password and auth database.

Comment by Andrew Morrow (Inactive) [ 03/Sep/15 ]

Also, is there any possibility that an early call to mongo::client::terminate is taking place?

Comment by Andrew Morrow (Inactive) [ 03/Sep/15 ]

Hi Judy.Han -

From the stack trace you have posted, it appears that you are calling connect from an ACE thread. What is the lifecycle of that thread with respect to the call to mongo::client::initialize? In other words, is there a definite happens before relationship between call to mongo::client::initialize and the creation of the thread that calls mongo::ConnectionString::connect? Is there any possibility that the call to ConnectionString::connect from the ACE thread is racing with, or occurs before, the call to mongo::client::initialize?

Comment by Judy Han [X] [ 03/Sep/15 ]

something wrong with ReplicaSetMonitor? As you mentioned, it's almost like ReplicaSetMonitor::initialize() is not called.
But I did call:
if (!mongo::client::initialize().isOK())

{ MTX_LOG_ERROR_AND_THROW("mongo initialize failed", MtxUtil::CaughtUnknownException); }

And we do not have any problem with this call.

Comment by Judy Han [X] [ 03/Sep/15 ]

Hi Andrew,
Yes. Here is a line from README.md.

Version [1.0.0](https://github.com/mongodb/mongo-cxx-driver/releases/tag/legacy-1.0.0)

Comment by Andrew Morrow (Inactive) [ 03/Sep/15 ]

Judy.Han - Could you please confirm that you are using legacy-1.0.0 as stated in the "Affects Version" field?

Comment by Judy Han [X] [ 03/Sep/15 ]

Hi Adam,
I am so sorry for the late reply, I was tied up with some other tasks...

I tried it again, this time, the exception is not caught and I got a core dump. Following are the relevant information:

#0  0x000000355d232625 in raise () from /lib64/libc.so.6
#1  0x000000355d233e05 in abort () from /lib64/libc.so.6
#2  0x000000355d22b74e in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000355d22b810 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fc8f198d4a8 in boost::scoped_ptr<mongo::(anonymous namespace)::ReplicaSetMonitorWatcher>::operator-> (this=Unhandled dwarf expression opcode 0xfa
) at /home/jhan/Software/boost/boost_1_55_0/boost/smart_ptr/scoped_ptr.hpp:99
#5  0x00007fc8f1a0cfc5 in operator-> (name="MtxEventReplSet", servers=
    std::set with 3 elements = {...})
    at /home/jhan/Software/boost/boost_1_55_0/boost/smart_ptr/detail/shared_count.hpp:371
#6  mongo::ReplicaSetMonitor::createIfNeeded (name="MtxEventReplSet", servers=
    std::set with 3 elements = {...}) at src/mongo/client/replica_set_monitor.cpp:347
#7  0x00007fc8f19f0892 in mongo::DBClientReplicaSet::DBClientReplicaSet (this=0x7fc8d40131e0, name=
    "MtxEventReplSet", servers=Unhandled dwarf expression opcode 0xf3
) at src/mongo/client/dbclient_rs.cpp:168
#8  0x00007fc8f19d72a3 in mongo::ConnectionString::connect (this=0x7fc8e35fc1d0, errmsg="", 
    socketTimeout=0) at src/mongo/client/dbclient.cpp:302
#9  0x00007fc8f243f572 in EventLoader::EventLoaderWorkerTask::threadSpecificInit (this=0x1319d78)
    at services/MtxEventLoader/EventLoaderWorkerTask.cpp:245
#10 0x00007fc8fb103790 in MtxRpf::Task::svc (this=0x1319d78) at common/MtxRpf/Task.cpp:1981
#11 0x00007fc8f9e2d363 in ACE_Task_Base::svc_run (args=0x1319d78) at tools/ace/ace/Task.cpp:271
#12 0x00007fc8f9e2d96c in ACE_Thread_Adapter::invoke_i (this=0x131ba50)
    at tools/ace/ace/Thread_Adapter.cpp:161
#13 0x00007fc8f9e2d82e in ACE_Thread_Adapter::invoke (this=0x131ba50)
    at tools/ace/ace/Thread_Adapter.cpp:96
#14 0x00007fc8f9d85ad1 in ace_thread_adapter (args=0x131ba50)
    at tools/ace/ace/Base_Thread_Adapter.cpp:122
#15 0x000000355da079d1 in start_thread () from /lib64/libpthread.so.0
#16 0x000000355d2e89dd in clone () from /lib64/libc.so.6
(gdb) frame 7
#7  0x00007fc8f19f0892 in mongo::DBClientReplicaSet::DBClientReplicaSet (this=0x7fc8d40131e0, name=
    "MtxEventReplSet", servers=Unhandled dwarf expression opcode 0xf3
) at src/mongo/client/dbclient_rs.cpp:168
168	        ReplicaSetMonitor::createIfNeeded( name, set<HostAndPort>(servers.begin(), servers.end()) );
(gdb) p servers
Unhandled dwarf expression opcode 0xf3
(gdb) up
#8  0x00007fc8f19d72a3 in mongo::ConnectionString::connect (this=0x7fc8e35fc1d0, errmsg="", 
    socketTimeout=0) at src/mongo/client/dbclient.cpp:302
302	            DBClientReplicaSet * set = new DBClientReplicaSet( _setName , _servers , socketTimeout );
(gdb) p _servers
$1 = std::vector of length 3, capacity 4 = {
  {
    _host = "x.x.x.3", 
    _port = 27017
  },
  {
    _host = "x.x.x.2", 
    _port = 27017
  },
  {
    _host = "x.x.x.1", 
    _port = 27017
  }
}
(gdb) frame 6
#6  mongo::ReplicaSetMonitor::createIfNeeded (name="MtxEventReplSet", servers=
    std::set with 3 elements = {...}) at src/mongo/client/replica_set_monitor.cpp:347
347	        replicaSetMonitorWatcher->safeGo();
(gdb) p replicaSetMonitorWatcher
$1 = {
  px = 0x0
}

Also following are printed on the screen:

mtx: /home/jhan/Software/boost/boost_1_55_0/boost/smart_ptr/scoped_ptr.hpp:99: T* boost::scoped_ptr<T>::operator->() const [with T = mongo::{anonymous}::ReplicaSetMonitorWatcher]: Assertion `px != 0' failed.
Aborted (core dumped)

Comment by Adam Midvidy [ 26/Aug/15 ]

Judy, if possible could you provide some more information - ideally a stacktrace from the exception being thrown as well as the text of your program or any logging output you have from the driver.

Comment by Judy Han [X] [ 06/Aug/15 ]

Hi Adam,
Sorry for the late reply ( had to work on some other issues).
I did call mongo::client::initialize:

    if (!mongo::client::initialize().isOK()) {
        LOG_ERROR_AND_THROW("mongo initialize failed", CaughtUnknownException);
    }

Comment by Adam Midvidy [ 05/Aug/15 ]

Hey Judy.Han,

How are you initializing the driver? It looks like you would get that error if you were not actually creating a mongo::client::GlobalInstance object (nor calling mongo::client::initialize) at the start of your program.

Adam

Generated at Wed Feb 07 21:59:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.