[SERVER-7004] db.hostInfo() crashes mongod with: Uncaught std::exception: basic_string::_S_create Created: 10/Sep/12  Updated: 11/Jul/16  Resolved: 18/Sep/12

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 2.1.2, 2.2.0
Fix Version/s: 2.2.1, 2.3.0

Type: Bug Priority: Major - P3
Reporter: Thomas Rueckstiess Assignee: Ben Becker
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Was able to reproduce on Mac OS X 10.7.4 (Lion) and Mac OS X 10.8.1 (Mountain Lion)


Attachments: Text File hostinfo-crashes.log    
Operating System: OS X
Participants:

 Description   

Running db.hostInfo() crashes mongod (2.1.2 and 2.2.0) with

db.hostInfo()
Tue Sep 11 09:15:01 DBClientCursor::init call() failed
Tue Sep 11 09:15:01 query failed : admin.$cmd { hostInfo: 1.0 } to: 127.0.0.1:30000
Tue Sep 11 09:15:01 Error: error doing query: failed src/mongo/shell/collection.js:155
Tue Sep 11 09:15:01 trying reconnect to 127.0.0.1:30000
Tue Sep 11 09:15:01 reconnect 127.0.0.1:30000 failed couldn't connect to server 127.0.0.1:30000

Log file says:

Tue Sep 11 09:15:01 [conn1] runQuery called admin.$cmd { hostInfo: 1.0 }
Tue Sep 11 09:15:01 [conn1] run command admin.$cmd { hostInfo: 1.0 }
Tue Sep 11 09:15:01 [conn1] ERROR: Uncaught std::exception: basic_string::_S_create, terminating
Tue Sep 11 09:15:01 dbexit: 
Tue Sep 11 09:15:01 [conn1] shutdown: going to close listening sockets...

Crashed on 3 different machines (2 MBPro, 1 MBAir), tested default port (27017) and custom port (30000), tested binary and compiled on host, replica set and single instance.

Was not able to reproduce on two other OS X machines.

Logfile (verbose level 5) is attached.



 Comments   
Comment by auto [ 18/Sep/12 ]

Author:

{u'date': u'2012-09-17T11:03:19-07:00', u'email': u'ben.becker@10gen.com', u'name': u'Ben Becker'}

Message: SERVER-7004: properly set buffer size for darwin's sysctlbyname()
Branch: v2.2
https://github.com/mongodb/mongo/commit/6be65ade5359a1c9abac96718ccec8e6ed7ae571

Comment by auto [ 17/Sep/12 ]

Author:

{u'date': u'2012-09-17T11:03:19-07:00', u'email': u'ben.becker@10gen.com', u'name': u'Ben Becker'}

Message: SERVER-7004: properly set buffer size for darwin's sysctlbyname()
Branch: master
https://github.com/mongodb/mongo/commit/b90a2cd16e4f8ee9f2e181d1ac3856b1cfd8f51b

Comment by Ben Becker [ 12/Sep/12 ]

Yep, that's the correct approach. Initial version naively assumed truncation was ok. Patch coming as soon as time permits.

Comment by Thomas Rueckstiess [ 12/Sep/12 ]

According to the function manual, a preceding call to

sysctlbyname(sysctlName, NULL, &len, NULL, 0)

with NULL as oldp will return the required buffer length in the len parameter. The buffer can then be allocated with the right size.

Function manual excerpt:

 The size of the available data can be determined by calling sysctl() with the NULL argument for oldp.
 The size of the available data will be returned in the location pointed to by oldlenp.  For some opera-
 tions, the amount of space may change often.  For these operations, the system attempts to round up so
 that the returned size is large enough for a call to return the data shortly thereafter.

Comment by Thomas Rueckstiess [ 11/Sep/12 ]

Stack Trace:

#0  0x00007fff8b26dabd in __cxa_throw ()
#1  0x00007fff91a9037e in std::__throw_length_error ()
#2  0x00007fff91ab8469 in std::string::_Rep::_S_create ()
#3  0x00007fff91aafa0c in std::string::_S_construct<char const*> ()
#4  0x00007fff91ab89e0 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string ()
#5  0x000000010011afb0 in mongo::getSysctlByName<std::string> (sysctlName=0x100af9d55 "machdep.cpu.features") at processinfo_darwin.cpp:122
#6  0x000000010011a1e5 in mongo::ProcessInfo::SystemInfo::collectSystemInfo (this=0x1046015d0) at processinfo_darwin.cpp:160
#7  0x00000001000e33e8 in mongo::ProcessInfo::SystemInfo::SystemInfo (this=0x1046015d0) at processinfo.h:128
#8  0x00000001000e367a in mongo::ProcessInfo::sysInfo (this=0x10538a1b0) at processinfo.h:147
#9  0x00000001000e3778 in mongo::ProcessInfo::getAddrSize (this=0x10538a1b0) at processinfo.h:67
#10 0x00000001000e08f3 in mongo::HostInfoCmd::run (this=0x10118ee30, dbname=@0x10538aaf0, cmdObj=@0x10538af68, unnamed_arg=0, errmsg=@0x10538a560, result=@0x10538b4e8, fromRepl=false) at dbcommands_generic.cpp:322
#11 0x000000010076e04e in mongo::_execCommand (c=0x10118ee30, dbname=@0x10538aaf0, cmdObj=@0x10538af68, queryOptions=0, result=@0x10538b4e8, fromRepl=false) at dbcommands.cpp:1859
#12 0x000000010076fdd4 in mongo::execCommand (c=0x10118ee30, client=@0x104502480, queryOptions=0, cmdns=0x106800614 "admin.$cmd", cmdObj=@0x10538af68, result=@0x10538b4e8, fromRepl=false) at dbcommands.cpp:1985
#13 0x0000000100771590 in mongo::_runCommands (ns=0x106800614 "admin.$cmd", _cmdobj=@0x10538b5c8, b=@0x10538b540, anObjBuilder=@0x10538b4e8, fromRepl=false, queryOptions=0) at dbcommands.cpp:2069
#14 0x0000000100477385 in mongo::runCommands (ns=0x106800614 "admin.$cmd", jsobj=@0x10538b5c8, curop=@0x104800e00, b=@0x10538b540, anObjBuilder=@0x10538b4e8, fromRepl=false, queryOptions=0) at query.cpp:43
#15 0x0000000100477e5b in mongo::runQuery (m=@0x10538cae0, q=@0x10538bd78, curop=@0x104800e00, result=@0x104600ac0) at query.cpp:919
#16 0x000000010086edd2 in receivedQuery (c=@0x104502480, dbresponse=@0x10538c7e0, m=@0x10538cae0) at instance.cpp:244
#17 0x0000000100873d20 in mongo::assembleResponse (m=@0x10538cae0, dbresponse=@0x10538c7e0, remote=@0x10538c810) at instance.cpp:390
#18 0x0000000100016807 in mongo::MyMessageHandler::process (this=0x102117490, m=@0x10538cae0, port=0x105500130, le=0x104502370) at db.cpp:192
#19 0x0000000100685963 in mongo::pms::threadRun (inPort=0x105500130) at message_server_port.cpp:85
#20 0x0000000100686e40 in boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> >::operator()<void (*)(mongo::MessagingPort*), boost::_bi::list0> (this=0x1055004d0, f=@0x1055004c8, a=@0x10538ce20, unnamed_arg=0) at bind.hpp:253
#21 0x0000000100686ea2 in boost::_bi::bind_t<void, void (*)(mongo::MessagingPort*), boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> > >::operator() (this=0x1055004c8) at bind_template.hpp:20
#22 0x0000000100686ecd in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(mongo::MessagingPort*), boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> > > >::run (this=0x1055002e0) at thread.hpp:62
#23 0x0000000100601cf9 in thread_proxy (param=0x1055002e0) at thread.cpp:121
#24 0x00007fff8a6ea782 in _pthread_start ()
#25 0x00007fff8a6d71c1 in thread_start ()

Comment by Thomas Rueckstiess [ 11/Sep/12 ]

Ben, it appears that sysctlbyname() is in fact returning 0 for oldlenp when the buffer is too small.

I also caught this exception:

Catchpoint 2 (exception caught), throw location unknown, catch location unknown, exception type std::length_error
0x00007fff8b26dabd in __cxa_throw ()

which points to the length passed to the string constructor being < 0.

Comment by Ben Becker [ 11/Sep/12 ]

Hmm, if this is indeed caused by the sysctl value's length, it seems sysctlbyname() doesn't quite function as per the man page, or I'm misunderstanding something. Given the following description:

     int
     sysctlbyname(const char *name, void *oldp, size_t *oldlenp, void *newp, size_t newlen);
...
 
     The information is copied into the buffer specified by oldp.  The size of the buffer is given by the
     location specified by oldlenp before the call, and that location gives the amount of data copied after
     a successful call and after a call that returns with the error code ENOMEM.  If the amount of data
     available is greater than the size of the buffer supplied, the call supplies as much data as fits in
     the buffer provided and returns with the error code ENOMEM.  If the old value is not desired, oldp and
     oldlenp should be set to NULL.
...
RETURN VALUES
     Upon successful completion, the value 0 is returned; otherwise the value -1 is returned and the global
     variable errno is set to indicate the error.

It seems the len parameter should be set to 256 on encountering ENOMEM, thus no buffer overflow or attempting to read past the end of the buffer. The code currently has a check for an error (less than 0) but continues on error. This should be fixed to return a default string. It's also worth noting that ENOMEM is a positive value. That's all a bit tangential though – the results should just be a truncated string if len is populated with the 'amount of data copied' according to the man page.

The exception could also be caused by sysctlbyname() returning a length of zero (due to the size-1 in the string ctor). A check for this should probably be added regardless.

@Thomas, would it be possible to start mongod with gdb and post the backtrace to confirm this is the cause of the exception? You may want to run 'catch throw' first to ensure we get the stack trace at the time the exception is thrown, but you may need to continue past a few 'normal' exceptions.

Thanks,
Ben

Comment by Thomas Rueckstiess [ 11/Sep/12 ]

In src/mongo/util/processinfo_darwin.cpp line 117 a string buffer of length 256 is allocated.

The call to

sysctlbyname(sysctlName, &value, &len, NULL, 0)

in line 119 returns a string that can potentially be longer than 256 char. On my machine (MacBook Pro Retina Mid 2012, OS X 10.8.1) a call to "sysctl machdep.cpu.features" from the command line returns

FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 xAPIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C

which is 257 characters long and thus creates a buffer overflow. On an older MacBook Pro (MacBook Pro Late 2011, OS X 10.8.1), the returned string was only 249 char long, thus the error didn't occur.

Generated at Thu Feb 08 03:13:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.