[SERVER-44490] The mongo command connecting mongodb instance hangs forever. Created: 08/Nov/19  Updated: 29/Mar/20  Resolved: 29/Mar/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chengcheng Ma Assignee: Dmitry Agranat
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File gdb_output.txt     Text File mongod.log     File node-3.diagnostic.data.tar.gz    
Issue Links:
Duplicate
is duplicated by SERVER-44488 The mongo command connecting to mongod Closed
Related
related to SERVER-31077 do not attempt to save history to a b... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

[^diagnostic.data.tar.gz]

Participants:

 Description   

We are now facing a issue, which trying to connect to the mongod through mongo client hangs forever. Nothing response, until Ctr+C stop the attempt. And the output of the stop is,

MongoDB shell version v3.6.8
connecting to: mongodb://127.0.0.1:28000/admin
MongoDB server version: 3.6.8
^C2019-11-08T10:09:59.607+0800 E - [main] Error saving history file: FileOpenFailed: Unable to open() file : No such file or directory
2019-11-08T10:09:59.607+0800 I CONTROL [main] shutting down with code:0

 

Another strange thing is that the dbpath of the node is pretty larger than other replica set member. The Primary node's data dir is about 20GB , and the node is about 181GB. When I list all the files in the dbpath, the WiredTigerLAS.wt file is much large, 171GB. It is wired, because we cannot connect to the node any more, so why the WT cache is ALWAYS growing.

 

We also try to dump the stack info while trying to connect, and here is the log:

_Thread 3 (Thread 0x7fab3e99d700 (LWP 20357)):_
 _#0 0x00007fab40196d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0_
 _#1 0x000055f1e0133407 in mongo::DeadlineMonitor<mongo::mozjs::MozJSImplScope>::deadlineMonitorThread() ()_
 _#2 0x000055f1e0e3c2b0 in execute_native_thread_routine ()_
 _#3 0x00007fab40192dd5 in start_thread () from /lib64/libpthread.so.0_
 _#4 0x00007fab3febbead in clone () from /lib64/libc.so.6_
 _Thread 2 (Thread 0x7fab3e19c700 (LWP 20358)):_
 _#0 0x00007fab40199a8b in recv () from /lib64/libpthread.so.0_
 _#1 0x000055f1e0be90b4 in mongo::Socket::_recv(char*, int) ()_
 _#2 0x000055f1e0be9137 in mongo::Socket::recv(char*, int) ()_
 _#3 0x000055f1e0be0c78 in mongo::MessagingPort::recv(mongo::Message&) ()_
 _#4 0x000055f1e0be1f45 in mongo::MessagingPort::call(mongo::Message const&, mongo::Message&) ()_
 _#5 0x000055f1e0754b2f in mongo::DBClientConnection::call(mongo::Message&, mongo::Message&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()_
 _#6 0x000055f1e074998c in mongo::DBClientBase::runCommandWithTarget(mongo::OpMsgRequest) ()_
 _#7 0x000055f1e074e781 in mongo::DBClientConnection::runCommandWithTarget(mongo::OpMsgRequest) ()_
 _#8 0x000055f1e074c36a in std::_Function_handler<void (mongo::executor::RemoteCommandRequest, std::function<void (mongo::executor::RemoteCommandResponse)>), mongo::DBClientBase::_auth(mongo::BSONObj const&)::\{lambda(mongo::executor::RemoteCommandRequest, std::function<void (mongo::executor::RemoteCommandResponse)>)#1}>::_M_invoke(std::_Any_data const&, mongo::executor::RemoteCommandRequest&&, std::function<void (mongo::executor::RemoteCommandResponse)>&&) ()_
 _#9 0x000055f1e080f79d in mongo::(anonymous namespace)::asyncSaslConversation(std::function<void (mongo::executor::RemoteCommandRequest, std::function<void (mongo::executor::RemoteCommandResponse)>)>, std::shared_ptr<mongo::SaslClientSession> const&, mongo::BSONObj const&, mongo::BSONObj const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::function<void (mongo::executor::RemoteCommandResponse)>) [clone .constprop.74] ()_
 _#10 0x000055f1e0813c29 in mongo::(anonymous namespace)::saslClientAuthenticateImpl(std::function<void (mongo::executor::RemoteCommandRequest, std::function<void (mongo::executor::RemoteCommandResponse)>)>, mongo::HostAndPort const&, mongo::BSONObj const&, std::function<void (mongo::executor::RemoteCommandResponse)>) ()_
 _#11 0x000055f1e080903e in mongo::auth::(anonymous namespace)::auth(std::function<void (mongo::executor::RemoteCommandRequest, std::function<void (mongo::executor::RemoteCommandResponse)>)>, mongo::BSONObj const&, mongo::HostAndPort const&, mongo::StringData, std::function<void (mongo::executor::RemoteCommandResponse)>) [clone .constprop.88] ()_
 _#12 0x000055f1e0809f1c in mongo::auth::(anonymous namespace)::asyncAuth(std::function<void (mongo::executor::RemoteCommandRequest, std::function<void (mongo::executor::RemoteCommandResponse)>)>, mongo::BSONObj const&, mongo::HostAndPort const&, mongo::StringData, std::function<void (mongo::executor::RemoteCommandResponse)>) ()_
 _#13 0x000055f1e080a11c in mongo::auth::authenticateClient(mongo::BSONObj const&, mongo::HostAndPort const&, mongo::StringData, std::function<void (mongo::executor::RemoteCommandRequest, std::function<void (mongo::executor::RemoteCommandResponse)>)>, std::function<void (mongo::executor::RemoteCommandResponse)>) ()_
 _#14 0x000055f1e07483f2 in mongo::DBClientBase::_auth(mongo::BSONObj const&) ()_
 _#15 0x000055f1e0753d5a in mongo::DBClientConnection::_auth(mongo::BSONObj const&) ()_
 _#16 0x000055f1e014ecf3 in mongo::mozjs::MongoBase::Functions::auth::call(JSContext*, JS::CallArgs) ()_
 _#17 0x000055f1e015711f in bool mongo::mozjs::smUtils::wrapConstrainedMethod<mongo::mozjs::MongoBase::Functions::auth, true, mongo::mozjs::MongoExternalInfo>(JSContext*, unsigned int, JS::Value*) ()_
 _#18 0x000055f1e05d37cb in js::Invoke(JSContext*, JS::CallArgs const&, js::MaybeConstruct) ()_
 _#19 0x000055f1e05cf171 in Interpret(JSContext*, js::RunState&) ()_
 _#20 0x000055f1e05d3543 in js::RunScript(JSContext*, js::RunState&) ()_
 _#21 0x000055f1e05d9f3f in js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::AbstractFramePtr, JS::Value*) ()_
 _#22 0x000055f1e05da0a9 in js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*) ()_
 _#23 0x000055f1e04800ca in ExecuteScript(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSScript*>, JS::Value*) ()_
 _#24 0x000055f1e0480170 in JS_ExecuteScript(JSContext*, JS::Handle<JSScript*>, JS::MutableHandle<JS::Value>) ()_
 _#25 0x000055f1e0140351 in mongo::mozjs::MozJSImplScope::exec(mongo::StringData, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, int) ()_
 _#26 0x000055f1e00fdc10 in mongo::shell_utils::initScope(mongo::Scope&) ()_
 _#27 0x000055f1e01435d5 in mongo::mozjs::MozJSImplScope::MozJSImplScope(mongo::mozjs::MozJSScriptEngine*) ()_
 _#28 0x000055f1e01727d5 in mongo::mozjs::MozJSProxyScope::implThread(void*) ()_
 _#29 0x000055f1e016e37c in nspr::Thread::ThreadRoutine(void*) ()_
 _#30 0x000055f1e0e3c2b0 in execute_native_thread_routine ()_
 _#31 0x00007fab40192dd5 in start_thread () from /lib64/libpthread.so.0_
 _#32 0x00007fab3febbead in clone () from /lib64/libc.so.6_
 _Thread 1 (Thread 0x7fab417c8a80 (LWP 20356)):_
 _#0 0x00007fab40196965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0_
 _#1 0x000055f1e0e3969c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()_
 _#2 0x000055f1e016fed3 in mongo::mozjs::MozJSProxyScope::runOnImplThread(std::function<void ()>) ()_
 _#3 0x000055f1e01700c8 in mongo::mozjs::MozJSProxyScope::MozJSProxyScope(mongo::mozjs::MozJSScriptEngine*) ()_
 _#4 0x000055f1e0130332 in mongo::mozjs::MozJSScriptEngine::createScope() ()_
 _#5 0x000055f1e00d2280 in _main(int, char**, char**) ()_
 _#6 0x000055f1e009da8c in main ()_

 

We searched but not found any clue.

The MongoDB is running in the docker container, and the version of mongo is 3.6.8. 

We also attached the diagnostic.data.

 

 



 Comments   
Comment by Dmitry Agranat [ 29/Mar/20 ]

Hi cora_ma,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Dmitry Agranat [ 19/Nov/19 ]

Hi cora_ma, thank you for providing the requested information. The cluster in question is underprovisioned and most of the time cannot sustain the applied load. This is the main reason some operations/connections timeout. In addition to upgrading your HW, based on your workload, I also recommend upgrading MongoDB to 4.x to benefit from [Non-Blocking Secondary Read|shttps://www.mongodb.com/blog/post/mongodb-40-nonblocking-secondary-reads].

Comment by Chengcheng Ma [ 13/Nov/19 ]

@Dmitry Agranat, thank you so much for you reply. 

  1. The command I use to connect to mongod is the mongo shell client. As you said, the shell is waiting for authentication, cause I can connect to the db successfully without username and password, but when i issue db.auth(), everything hangs.
  2. As for the mongod log, no more evidence, only some connection logs, which is another strange thing. i cannot login the db through the mongo shell command, but the mongod still may receive other connections.
  3. we have restart the node ,or event resync data through initial sync, all of which can solve the problem. the diagnostic data is generated very after the initial sync.
  4. The gdb output, same as before.gdb_output.txt

All the info you need are all attached. 

Comment by Dmitry Agranat [ 12/Nov/19 ]

Hi cora_ma,

Thank you for the additional information and the callstack output. It shows that the shell is just waiting for a SASL auth command to return from the server. The server might be just having trouble processing the command. The fact that you've mentioned that WiredTigerLAS.wt file was 171 GB during this issue could also indicate that the server had some performance issues at that time.

Could you repeat connecting to mongod and if it hangs again, please provide the following data and upload it into our secure portal?

  • $dbpath/diagnostic.data directory (the contents are described here)
  • mongoD log
  • A new gdb output
  • The exact command you use to connect to mongod through mongo client

Thanks,
Dima

Comment by Chengcheng Ma [ 11/Nov/19 ]

Hi @Dmitry Agranat ,

I'm not sure whether the problem we are facing is duplicate of SERVER-31077 . That issue seems to enter the mongo shell successfully, but mine is not. The connecting attempt hangs, no return, no feedback.

I also tried to use gdb to attach the hanged connecting pid, and here are the output:

 

# gdb mongo 12710
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /var/lib/mongodb-3.6.8/bin/mongo...Missing separate debuginfo for /var/lib/mongodb-3.6.8/bin/mongo
Try: yum --enablerepo='debug' install /usr/lib/debug/.build-id/a6/a687c84b5b82cdac4157eba7154397ca9d835a.debug
(no debugging symbols found)...done.
Attaching to program: /var/lib/mongodb-3.6.8/bin/mongo, process 12710
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libcrypto.so.10...Reading symbols from /lib64/libcrypto.so.10...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypto.so.10
Reading symbols from /lib64/libssl.so.10...Reading symbols from /lib64/libssl.so.10...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libssl.so.10
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 12712]
[New LWP 12711]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libz.so.1...Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libgssapi_krb5.so.2...Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libkrb5.so.3...Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libk5crypto.so.3...Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /lib64/libkrb5support.so.0...Reading symbols from /lib64/libkrb5support.so.0...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libselinux.so.1...Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libpcre.so.1...Reading symbols from /lib64/libpcre.so.1...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libpcre.so.1
0x00007f1be1be2965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.5.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64

 

So if you are sure about the issue we are encountering is exact the case which you posted , I will follow that one for updates.

 

Thanks a lot.

Comment by Dmitry Agranat [ 10/Nov/19 ]

Hi cora_ma,

Thank you for the report. This issue seems to be a duplicate of SERVER-31077.

Please follow SERVER-31077 for updates.

Thanks,
Dima

Generated at Thu Feb 08 05:06:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.